Python: ScrapyFromScript

Che-Jui Huang
2 min readMay 15, 2021
Photo by HalGatewood.com on Unsplash

Background:

So I was asked to study the topic of “Web Scraping/Crawling” at work. Specifically, my task was to collect the entire text data based on a website domain which I have no idea how to start. Nonetheless, I had come across a python tool called SCRAPY!!
Link to the official documentation:

Great Tutorial to Start:

I followed the tutorials from this author. I think the author did a great job in introducing the basics of Scrapy, thus I highly recommend you to check it out!

Main Content:

If you try to look into Scrapy tutorials online, you will soon find out that not many articles talk about “How to run Scrapy from the script?”

One of the reasons is that Scrapy has a well-designed built-in system. However, just to provide an alternative for users, I have created demos for you to play with. The python script is built based on the tutorials listed above, please check out the tutorials before proceeding to my repo.
Good Luck and Have Fun!

Assumptions

1. You have a basic understanding of HTML and know how to inspect a website
2. You have python installed and know how to run python from CLI

What you will get out of this REPO

Learn to run Scrapy from CLI (python WebScraper.py) under scenarios of
1. Single Page scraping and 2. Multi-Page Crawling

Sample Output

{“title”: “It’s Only the Himalayas”,
“UPC”: “a22124811bfa8350”,
“IMG”: “http://books.toscrape.com/media/cache/6d/41/6d418a73cc7d4ecfd75ca11d854041db.jpg",
“stars”: “ Two”,
“des”: Wherever you go, whatever you do, just . . .
“price”: “£45.17”,
“price_inc_tax”: “£45.17”,
“instock”: “In stock (19 available)”},

--

--