Contribute Media
A thank you to everyone who makes this possible: Read More

From Pandas to PySpark

Description

Tired of waiting for massive datasets to load on your local machine? In this beginner-friendly tutorial, we’ll explore how to scale your data analysis skills from pandas to PySpark using a real-world anime dataset. We’ll walk through the basics of distributed computing, discuss why Spark was created, and demonstrate the benefits of working with PySpark for big data tasks—including reading, cleaning, and transforming millions of records with ease. By the end of this workshop, you’ll understand how PySpark harnesses cluster computing to handle large-scale data and you’ll be comfortable applying these techniques to your own projects.

Participant Requirements: - A laptop (any OS) with an internet connection - A Google account (to access Colab notebooks and slides) - Familiarity with Python and pandas

Here's the link to the Google Colab to follow along 👇🏾 https://colab.research.google.com/drive/1fi0cTQ1NIE5kDEH0ynp2sqDuVeiBJJWU?usp=sharing

Here are the slides 👇🏾 https://drive.google.com/file/d/11JIih1VzLxTJ9O6PeGzqD_e8vumTZQmw/view?usp=sharing

Improve this page