Description
Dask is a popular Python library for scaling and parallelizing Python code on a single machine or across a cluster. It provides familiar, high-level interfaces to extend the PyData ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. In this tutorial we’ll cover more advanced features of Dask like task graph optimization, the worker and scheduler plugin system, how to inspect the internal state of a cluster, and more. Attendees should walk away with a deeper understanding of Dask’s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own data intensive workloads.