Description
Recent versions of Pandas have warned users of the imminent deprecation of the Panel, Panda’s name-sake data structure for storing three-dimensional data. This talk will examine the tradeoffs in performance and interface between two types of Panel alternatives: using hierarchical indices in Pandas and StaticFrame, or using true n-dimensional arrays in NumPy or xarray.
This talk will aid those working in data science and related fields by examining the tradeoffs between working with data in true multidimensional data structures (i.e., NumPy and xarray) versus working with hierarchical index implementations on one- or two-dimensional data.
The immediate point of departure is Pandas imminent deprecation of the Panel: for Pandas users who have used the Panel, this talk will illustrate how to transition away from the Panel and the tradeoffs in Panel alternatives.
This talk will explain how hierarchical indices work by comparing two implementations: the Pandas MultiIndex and the StaticFrame IndexHierarchy. The StaticFream IndexHierarchy offers a new, independent implementation of hierarchical indices that deviates from Pandas in significant ways: the index is literally composed of other index objects, permitting usage of specialized index types (such as datetime indices), efficient memory usage of shared immutable objects, and the enforcement of a strict tree graph.
After demonstrating how hierarchical indices can support higher dimensional data in one or two-dimensional arrays, the power and flexibility of selecting and slicing data with hierarchical indices will be demonstrated.
The talk will close with performance analysis, isolating the overhead of using hierarchical indices over true multi-dimensional array representations, and comparing the performance of selection, slicing, grouping, and function application of hierarchical data in NumPy, xarray, Pandas and StaticFrame.
This talk is aimed at both beginners, new to hierarchical indices, and more advanced users interested in interface design and performance tradeoffs. Basic familiarity with NumPy and Pandas is expected. Audience members will leave with a better understanding of how hierarchical indices work, and what tradeoffs are made when using them.