Contribute Media
A thank you to everyone who makes this possible: Read More

Simplified statistics through simulation

Description

We will learn how to make valid statistical inferences using only Python/Numpy in a way that is easy to understand. You shouldn't need to put blind faith in your local statistical expert. This tutorial will show you how to use less theory and validate your methods concretely using simulation.

When I started learning more about statistics I became very frustrated with the numerous specialized tests and statistical measures. They all have their place but they tend to make an already intimidating field even more intimidating. As a result, I began to figure out methods of validating what I was doing that were based more on repeatable simulation rather than theory. I was then able to show working code to fellow analysts that illustrated why I was doing the statistical analyses I was doing and they could replicate and tweak my simulations to explore the possibilities rather than debate. At it's heart, this is a tutorial about how to have more constructive conversations about statistical inference with your peers.

I will walk attendees through the topic via an Python notebook and all charting will be done in front of them as well using either Seaborn or Matplotlib. The core of the topic revolves around using distributions of data in order to drive all of our inference. I will take a few minutes in the beginning to prove out (via simulation) that these simulations on the data distributions are perfectly equivalent to their probability distributions. In other words, a beta distribution can be modeled using a distribution of 0's and 1's, a normal distribution can be developed sampling means randomly from observed static data, etc.

I will cover the following topics: Simulations and Monte Carlo methods Parametric vs. non-parametric statistical tests Bootstrapping Solve probability puzzles with simulation Answer "How many samples does this experiment need?" Find split test conversion lift using simulation From here to bayesian statistics (quick shout out to PyMC)

Slides available here: https://github.com/jcbozonier/PyData2015

Details

Improve this page