Contribute Media
A thank you to everyone who makes this possible: Read More

Going parallel and larger-than-memory with graphs

Description

Dask is an open source, pure python library that enables parallel larger-than-memory computation in a novel way. We represent programs as Directed Acyclic Graphs (DAG) of function calls. These graphs are executed by dask's schedulers with different optimizations (synchronous, threaded, parallel, distributed-memory). Dask has modules geared towards data analysis, which provide a friendly interface to building graps. One module, dask.array, mimics a subset of NumPy operations. With dask.array we can work with NumPy like arrays that are larger than RAM and parallelization comes for free by leveraging the underlying DAG.

Details

Improve this page