Description
Extracting structured information from a webpages is a relatively
simple task in python, given the innumerable tools at our disposal
namely BeautifulSoup, PyQuery, lxml etc. However, crawling and
scraping data from multiple websites makes the job difficult because
everyone on the internet likes to structure their information
differently.
Crawling upto 10 portals is manageable upto 10 portals, beyond that it
becomes a menace. What we need then, is a framework to keep the
crawling and parsing logic separate and also help manage the parsers.
This is where scrapy comes to our assistance. It is the most pythonic
way of scraping the web.