Description
Notebooks have traditionally been a tool for drafting code and avoiding repeated expensive computations while exploring solutions. However, with new tools like nteract's papermill and scrapbook libraries, this technology has been expanded to make a reusable and parameterizable template for execution. We'll look at how to make use of this pattern for Data and ETL processes.
Intro
- Myself, Netflix, and Why I'm here
- What does a Data Platform Team do?
- Projects and Open Source tools discussed in presentation Papermill, Jupyter, nteract, etc
Notebooks
What are Jupyter Notebooks?
We'll some visual examples and breakdowns of notebooks.
How Notebook Work
A guide through how a notebook executes and the model it uses to run your code.
Traditional Use Cases
Around experimentation and code development.
New Use Cases
For production data and operations without full rewrites of Notebook code.
Papermill
What is papermill?
papermill is a library for executing notebooks programmatically.
How do you use it?
You'll see some examples in Python and with it's provided CLI.
How does it fit into the Notebook model?
We'll relate the execution back into original Notebook execution diagrams.
How to extend papermill
Quick pointer to the extensibility of the library and how to add new functionality.
Using papermill in production data pipelines
Operationalizing Notebooks
Failure analysis, Productionalization, Sharing executions...
Dags of Notebooks
Making a pipeline with Notebooks.
Integration Testing
Good practices Where unittesting doesn't fit
@ Netflix usage
Quick blip about adoption and usage at Netflix.