Description
Modern Data Science: A new approach to DataFrames and pipelines Maarten Breddels, Jovan Veljanoski Audience level:
We show how to deal with massive datasets using small resources using the Python Vaex DataFrame library. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows, even on your laptop. As a bonus, Vaex can automatically generate a Machine Learning pipeline using the graph structure build-up internally in the DataFrame.