Contribute Media
A thank you to everyone who makes this possible: Read More

Replacing Hadoop with Your Laptop The Case for Multiprocessing


Many companies use big data, but distributed systems are complicated and have a bunch of tradeoffs. In this talk, I'll walk through analyzing the same dataset in bash, with Python multiprocessing, and on PySpark locally. I'll also talk about some tradeoffs you make as you move from local environments to distributed systems.


Improve this page