Description
How do you build a big data pipeline when the size of your data starts to get out of hand? How do you improve on the initial model when people are demanding more data, and faster?
This talk covers the evolution of a data pipeline in Python, from daily full load to up-to-the-minute event stream, based on our experience at FanDuel. Technologies covered include Amazon EMR, Redshift, Hadoop, Luigi, Spark and Kinesis. We also look at the challenges and trade-offs, and building a big data engineering team.