Contribute Media
A thank you to everyone who makes this possible: Read More

Dive into Scrapy

Description

Juan Riaza - Dive into Scrapy [EuroPython 2015] [21 July 2015] [Bilbao, Euskadi, Spain]

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

In this talk some advanced techniques will be shown based on how Scrapy is used at Scrapinghub.

Goals:

  • Understand why its necessary to Scrapy-ify early on.
  • Anatomy of a Scrapy Spider.
  • Using the interactive shell.
  • What are items and how to use item loaders.
  • Examples of pipelines and middlewares.
  • Techniques to avoid getting banned.
  • How to deploy Scrapy projects.

Details

Improve this page