Contribute Media
A thank you to everyone who makes this possible: Read More

Happiness inside a job: a social network analysis


In this talk, we will show how to analyze social network data from a mobile phone application to predict employee turnover and employee happiness. We will look into the process of extracting new features from a dataset using social network analysis techniques. We will also show how these features can be visualized and used to boost machine learning models.


In this talk, we will show how to analyze social network data to predict employee turnover and employee happiness. This talk is a summary of different notebooks about social network analysis that will be available on Github. Please note that all the methodology and details we are using, are explained in depth in the notebooks.

We will cover the following topics:

  • From tables to graphs: How to use pandas and networkx to merge several DataFrames into a graph, and different criteria for creating a graph representation of a database.
  • Graph features: How to extract new features using social network analysis techniques such as Non-Negative Matrix Factorization and graph centrality metrics. We will explain what these features represent, and how they relate to other types of features that can be extracted directly from the database.
  • Improving ML models: How to use graph features to boost machine learning models. We will talk about different feature selection techniques that can be used to filter out the most significant set of graph-based features using scikit-learn.
  • Visualization: How to visualize social interactions using bokeh, and how visualization techniques can be used to check data integrity.

We will also review common mistakes that can happen during the modeling process and, how to prevent invalid data from getting leaked into your machine learning models. To conclude we compare the merits of using a Python framework versus R Language in terms of development time and computational performance. Data set provided by

Improve this page