Description
PyData DC 2016
Exploratory data analysis plays a critical role in the job of every data scientist, but very few have a structured process or framework for exploring data quickly and efficiently. This talk will introduce the exploratory framework I use in my day-to-day work and will walk attendees through a practical example of how to use the framework to unlock hidden insights with the help of Python libraries.
At the heart of data analysis, there lies a need to understand the real world entities being represented in the data. Every data set we encounter is an attempt to capture a slice of our complex world and communicate some information about it in a way that has potential to be informative to humans, machines, or both. Moving from basic analyses to advanced analytics requires the ability to imagine multiple ways of conceptualizing the composition of entities and the relationships present in our data. It also requires the realization that different levels of aggregation, disaggregation, and transformation can open up new pathways to understanding our data and identifying the valuable insights it contains.
In this talk, we’ll discuss several ways to think about the composition and representation of our data. We’ll also demonstrate a series of methods that leverage tools like networks, hierarchical aggregations, and unsupervised clustering to visually explore our data, transform it to discover new insights, help frame analytical problems and questions, and even improve machine learning model performance. In exploring these approaches, and with the help of Python libraries such as Pandas, Scikit-Learn, Seaborn, and NetworkX, we will provide a practical framework for thinking creatively and visually about your data and unlocking latent value and insights hidden deep beneath its surface.