Description
In this tutorial, an introduction of Data Analysis with Python datatable, one would learn about data wrangling with datatable via a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets. We would show how to munge loan-level data, obtain basic insights, exploratory data analysis, model development, and model evaluation.
During the tutorial session, we would use a banking loan scenario using a subset of the Fannie Mae and Freddie Mac datasets where we would show how to munge loan-level data. Additionally, we would give an overview of how Python datatable is used to obtain basic insights that start with data wrangling, exploratory data analysis, model development, and model evaluation.
Python datatable is a library that implements a wide (and growing) range of operators for manipulating two-dimensional data frames. It focuses on: big data support, high performance, both in-memory and out-of-memory datasets, and multithreaded algorithms. Datatable’s powerful API is similar to R data.table’s, and it strives in providing friendlier and intuitive API experience with helpful error messages to accelerate problem-solving.
Learn more about Python datatable: https://github.com/h2oai/datatable
Prerequisites
- Basic knowledge of Statistics and Machine Learning
- Basic knowledge of Python
- JupyterLab
- Python datatable installed on your local machine or use cloud env:
- datatable can be install by following: https://datatable.readthedocs.io/en/latest/install.html
Note: As of now, datatable is only supported on Linux and Mac OS X. However, one can use it on Windows via a docker container.
Tutorial:
- Task 0: Introduction to Python datatable(10 min)
- Task 1: datatable vs Pandas (10 mins)
- Task 2: Understand the dataset (10 mins)
- Task 3: datatable - Data Wrangling (10 mins)
- Task 4: datatable - Exploratory Data Analysis (10 mins)
- Task 5: datatable - Model Development (10 mins)
- Task 6: datatable - Model Evaluation (10 mins)
- Task 7: Q &A (10 - 15 mins)