Contribute Media
A thank you to everyone who makes this possible: Read More

Siuba and duckdb: Analyzing Everything Everywhere All at Once

Description

Every data analysis in Python starts with a big fork in the road: which DataFrame library should I use?

The DataFrame Decision locks you into different methods, with subtly different behavior:: - different table methods (e.g. polars .with_columns() vs pandas .assign()) - different column methods (e.g. polars .map_dict() vs pandas .map())

In this talk, I'll discuss how siuba (a dplyr port to python) combines with duckdb (a crazy powerful sql engine) to provide a unified, dplyr-like interface for analyzing a wide range of data sources‚ whether pandas and polars DataFrames, parquet files in a cloud bucket, or pins on Posit Connect.

Finally, I'll discuss recent experiments to more tightly integrate siuba and duckdb.

Details

Improve this page