Description
Many companies use big data, but distributed systems are complicated and have a bunch of tradeoffs. In this talk, I'll walk through analyzing the same dataset in bash, with Python multiprocessing, and on PySpark locally. I'll also talk about some tradeoffs you make as you move from local environments to distributed systems.