Contribute Media
A thank you to everyone who makes this possible: Read More

Getting started with HDF5 and PyTables

Description

HDF5 is a data model, a library, a file format for storing and managing big and complex data. PyTables is a Python package built on top of the HDF5 library and NumPy. It provides a high-level interface with advanced indexing and database-like query capabilities. PyTables is both easy to use and extremely fast, so it might be an invaluable tool if you need to work with large, hierarchical datasets. At the end of this talk you will learn what HDF5 is, why it might be the right file format for you, and where PyTables fits in the Python data ecosystem.

Outline:
- What is HDF5 and who uses it?
- Brief overview of the HDF5 data model
- First steps with PyTables
- PyTables tools
- Search big data with PyTables and NumExpr
- Additional resources to learn more
- Q&A

Details

Improve this page