Contribute Media
A thank you to everyone who makes this possible: Read More

The Sum of Nothing

Description

The [release](https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-22-0-december-29-2017) of pandas version 0.22.0 in December 2017 introduced several major changes. As someone who works with missing data quite a lot, I was particularly confused and somewhat dismayed by its "new" treatment of NaNs ("null values"). Specifically:

  • the sum of a series of NaNs was now 0
  • the product of a series of NaNs was now 1

In the previous version, these values were NaN, which I thought was the "right" way to do things. After all, how can the sum (or product) of nothing turn into something? I went on a journey (or maybe the proper term is "rabbit hole" to explore this question, going through historical GitHub issues logs, pandas-dev mailing list messages, even contacting a core pandas developer and looking up how other programming languages like R handled the same issue.

I learned that really, it all just comes down to math.

In this talk, I'll make the case that while the current behavior is mathematically consistent, it is often counterintuitive. Because who says math is supposed to make sense at first glance?

Details

Improve this page