Description
Pandas provides a powerful set of functions to compute statistics on rolling windows. We will go beyond the convenient interface, and peek at the algorithmic gems that implement these operations efficiently: summed area tables, Welford's method, skip lists, ring buffers, deques... will all get their minute of fame, and attendees may learn a thing or two they can use in their everyday coding.
Abstract
The talk will cover:
- A quick overview of the Rolling and Expanding objects in pandas. Why straightforward implementations, even clever ones using advanced NumPy tricks, are inefficient.
- How is Series.rolling().sum() implemented. How the same idea can be extended to higher dimensions with the help of the inclusion-exclusion principle to yield summed area tables.
- How are Series.rolling().var() and friends implemented. The importance of numerically stable algorithms: why Welford's method is the better choice, even if it's a little slower. How can the same ideas be generalized to higher order moments, and used to parallelize computations.
- How are Series.rolling().max() and .min() implemented. The beauty of clever algorithms. The use of specialized data structures: a ring buffer as a deque.
- How are Series.rolling().median() and .quantile() implemented. More specialized data structures: the skip list, or why randomization can be your friend.