Description
In this talk I will discuss how to make use of Luigi, which is a tool written by Spotify to manage a pipeline of long running batch processes. Luigi allows you to write long running batch processes in pure Python and provides built-in support for Hadoop mapreduce jobs and abstractions for HDFS and local file systems. I'll show how to use Luigi to perform a simple set of chained mapreduce jobs and show how Luigi helps with dependency resolution, workflow management, and visualization.