Contribute Media
A thank you to everyone who makes this possible: Read More

Bayesian Statistical Analysis using Python - Part 2

Summary

This hands-on tutorial will introduce statistical analysis in Python using Bayesian methods. Bayesian statistics offer a flexible & powerful way of analyzing data, but are computationally-intensive, for which Python is ideal. As a gentle introduction, we will solve simple problems using NumPy and SciPy, before moving on to Markov chain Monte Carlo methods to build more complex models using PyMC.

Description

The aim of this course is to introduce new users to the Bayesian approach of statistical modeling and analysis, so that they can use Python packages such as NumPy, SciPy and PyMC effectively to analyze their own data. It is designed to get users quickly up and running with Bayesian methods, incorporating just enough statistical background to allow users to understand, in general terms, what they are implementing. The tutorial will be example-driven, with illustrative case studies using real data. Selected methods will include approximation methods, importance sampling, Markov chain Monte Carlo (MCMC) methods such as Metropolis-Hastings and Slice sampling. In addition to model fitting, the tutorial will address important techniques for model checking, model comparison, and steps for preparing data and processing model output. Tutorial content will be derived from the instructor's book Bayesian Statistical Computing using Python, to be published by Springer in late 2014.

PyMC forest plot

PyMC forest plot

DAG

DAG

All course content will be available as a GitHub repository, including IPython notebooks and example data.

Tutorial Outline

  1. Overview of Bayesian statistics.
  2. Bayesian Inference with NumPy and SciPy
  3. Markov chain Monte Carlo (MCMC)
  4. The Essentials of PyMC
  5. Fitting Linear Regression Models
  6. Hierarchical Modeling
  7. Model Checking and Validation

Installation Instructions

The easiest way to install the Python packages required for this tutorial is via Anaconda, a scientific Python distribution offered by Continuum analytics. Several other tutorials will be recommending a similar setup.

One of the key features of Anaconda is a command line utility called conda that can be used to manage third party packages. We have built a PyMC package for conda that can be installed from your terminal via the following command:

conda install -c https://conda.binstar.org/pymc pymc

This should install any prerequisite packages that are required to run PyMC.

One caveat is that conda does not yet have a build of PyMC for Python 3. Therefore, you would have to build it yourself via pip:

pip install git+git://github.com/pymc-devs/pymc.git@2.3

For those of you on Mac OS X that are already using the Homebrew package manager, I have prepared a script that will install the entire Python scientific stack, including PyMC 2.3. You can download the script here and run it via:

sh install_superpack_brew.sh
Improve this page