Description
Dask is a pure python library for parallel and distributed computing. It's designed with flexibility in mind, making it easy to parallelize the complicated workflows often found in science. However, once you get something working, how do you debug or profile it? In this talk we'll cover the various tools Dask provides for diagnosing bugs and bottlenecks, as well as tips for resolving these issues.
Abstract
Dask is a pure python library for parallel and distributed computing. It's designed with simplicity and flexibility in mind, making it easy to parallelize the complicated workflows often found in science. However, once you get something working, how do you debug or profile it? Debugging and profiling parallel code is notoriously hard! In this talk we'll cover the various tools Dask provides for diagnosing bugs and performance bottlenecks, as well as tips and techniques for resolving these issues.
Starting with an example single-threaded probram, we'll walk through adding Dask to parallelize it, and then iterate on this example to gradually improve performance throughout the talk. Attendees should leave having a better understanding of:
- What tools Dask provides for debugging on a single machine
- How Dask uses IPython to make debugging distributed computations easy
- How to profile Dask code both on a single machine and on a cluster
- How to interpret the graphs presented in the Dask Dashboard
- Performance techniques for attacking bottlenecks once they've been identified.