Contribute Media
A thank you to everyone who makes this possible: Read More

dask-image: distributed image processing for large data


Genevieve Buckley

Image datasets are large, and becoming larger. The widely used benchmark dataset COCO (Common Objects in Context) contains 330,000 individual images. The average size of a single entry on the image database EMPIAR is over 1TB, and can easily reach several terabytes. Even where individual images are small enough to fit in-memory, many existing parallelization methods are difficult to scale seamlessly between a laptop and a supercomputing cluster. For instance, the python multiprocessing module is restricted to a single mode and can't take advantage of multiple compute nodes on a distributed supercomputing cluster.

We need easy ways to work with large image data. This talk introduces dask-image, a python library for distributed image processing. The target audience are python programmers currently using numpy and scipy with large array data, where the whole dataset cannot fit in memory or is close to that limit. It's for people who want to get started with parallel processing, either because they have large single-image data, or because they want to do batch processing applying the same analysis to many smaller images (sometimes known an embarrassingly parallel problem). The specific image analysis functions provided by dask-image are of broad interest to a diverse range of analysis applications including (but not limited to) video/streaming data, computer vision, and scientific fields including astronomy, microscopy and geosciences.

Specifically, this talk will cover:

  • An overview of the dask-image library
    • Lazy image loading
    • Image pre-processing functionality (convolutions, filters, etc.)
    • Analysis of segmented images (distributed labeling, and measurements of those label regions)
  • Mixing in your own custom analysis functions (using dask delayed, map_blocks, and map_overlap)

  • A practical case study of a Python image processing pipeline

dask-image is open source, released under a BSD 3-Clause license, and can be installed using conda or pip. You can find the source code at and the quickstart guide at

Produced by NDV:

Python, PyCon, PyConAU, PyConline

Fri Sep 4 13:55:00 2020 at Curlyboi

Improve this page