Hassle Free ETL with PySpark

YouTube

Description

While the models of data science get all the press, the real work is in the maze of data preprocessing and pipelines. The goal of this talk is to get a glimpse into how you can use Python and the distributed power of Spark to simplify your (data) life, ditch the ETL boilerplate and get to the insights. We’ll intro PySpark and considerations in ETL jobs with respect to code structure and performance.

PyVideo

Hassle Free ETL with PySpark

Description

Details