Contribute Media
A thank you to everyone who makes this possible: Read More

Processing large JSON files without running out of memory

Description

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. If you're running in the cloud, you can get a machine with more memory, but that means higher costs. How can you process these large files without running out of memory?

In this talk you'll learn:

How to measure memory usage. Some of the reasons why loading JSON uses so much memory.

Then, you'll learn some of the solutions to this problem:

Using a more efficient in-memory representation. Only loading the subset of the data you need. Streaming parsing, which can parse arbitrarily-large files with a fixed amount of memory. Using a different file format, like JSON Lines.

Details

Improve this page