Contribute Media
A thank you to everyone who makes this possible: Read More

Visualizing Data Analysis Pipelines with Pandas Tutor and Tidy Data Tutor

Description

The data frame is a fundamental data structure for data scientists using Python and R. Pandas and the tidyverse are designed to center building pipelines for the transformation of data frames. However, within these pipelines it is not always clear how each operation is changing the underlying data frame. To explain each step in a pipeline data science instructors resort to hand-drawing diagrams to illustrate the semantics of operations such as filtering, sorting, and grouping.

In this talk, I will introduce Pandas Tutor and Tidy Data Tutor, step-by-step visual representation engines of data frame transformations. Both tools illustrate the row, column, and cell-wise relationships between an operation's input and output data frames.

Details

Improve this page