Description
Scientific discoveries are increasingly driven by the analysis of large volumes of image data, and many tools and systems have emerged to support distributed data storage and scalable computation. It is not always immediately clear, however, how well these systems support real-world scientific use cases. Our team set out to evaluate the performance and ease-of-use of five such systems (SciDB, Myria, Spark, Dask, and TensorFlow), as applied to real-world image analysis pipelines drawn from astronomy and neuroscience. We find that each tool has distinct advantages and shortcomings, which point the way to new research opportunities in making large-scale scientific image analysis both efficient and easy to use.