Description
Our lab specializes in hyperspectral imaging using a spectral imager that combines tunable filters with colour sensors. Compared to simpler, more established imaging systems, this results in some unique challenges for the data processing. Especially, many of the original imaging parameters need to be preserved an d joined with calibration-derived values to actually compute radiance values from the raw sensor data since they are not automatically handled by the hardware. Handling this metadata with the resulting hyperspectral images results in combined datasets of large 3-dimensional datacube, and multiple smaller 2D and 1D arrays with linked dimensions.
We have built our solution to this problem utilizing Xarray for handling the multiple arrays of data as well as the existing Dask integration for providing easy parallelization for the required preprocessing. Xarray also provides us many other advantages, such as:
- Exploration of very complex multi-dimensional datasets (especially when utilizing holoviews)
- Interoperability with the scikit ecosystem
- Serialization to NetCDF preserving all the data in a single file
However, our extensive and somewhat non-conventional use of Xarray does also bring out it's shortcomings when trying to develop such a library as ours, such as indexing issues with multiple possible overlapping coordinates and performance issues with complex datasets.
We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish.