Description
Every data analysis in Python starts with a big fork in the road: which DataFrame library should I use?
The DataFrame Decision locks you into different methods, with subtly different behavior:: - different table methods (e.g. polars .with_columns() vs pandas .assign()) - different column methods (e.g. polars .map_dict() vs pandas .map())
In this talk, I'll discuss how siuba (a dplyr port to python) combines with duckdb (a crazy powerful sql engine) to provide a unified, dplyr-like interface for analyzing a wide range of data sources‚ whether pandas and polars DataFrames, parquet files in a cloud bucket, or pins on Posit Connect.
Finally, I'll discuss recent experiments to more tightly integrate siuba and duckdb.