Description
When I first began working with the Python Pandas library, I was told by an experienced Python engineer: "Pandas is fine for prototyping a bit of calculations, but it's too slow for any time-sensitive applications." Over multiple years of working with the Pandas library, I have realized that this was only true if not enough care is put into identifying proper ways to optimize the code's performance. This talk will review some of the most common beginner pitfalls that can cause otherwise perfectly good Pandas code to grind to a screeching halt, and walk through a set of tips and tricks to avoid them. Using a series of examples, we will review the process for identifying the elements of the code that may be causing a slowdown, and discuss a series of optimizations, ranging from good practices of input data storage and reading, to the best methods for avoiding inefficient iterations, to using the power of vectorization to optimize functions for Pandas dataframes.