Contribute Media
A thank you to everyone who makes this possible: Read More

Validating and Testing R Dataframes with Pandera via reticulate

Description

Original Full Title: Validating and Testing R Dataframes with Pandera via reticulate: A Case Study in R-Python Interoperability

Data science and machine learning practitioners work with data every day to analyze and model them for insights and predictions. A major component of any project is data quality, which is a process of cleaning, and protecting against flaws in data that may invalidate the analysis or model. Pandera is an open source data testing toolkit for dataframes in the Python ecosystem: but can it validate R dataframes?

This talk is composed of three parts: first I'll describe what data testing is and motivate why you need it. Then, I'll introduce the iterative process of creating and refining dataframe schemas in Pandera. Finally, I'll demonstrate how to use it in R with the reticulate package using a simple modeling exercise as an example.

Details

Improve this page