Description
Filmed at PyData London 2017
Description Bayesian neural networks have seen a resurgence of interest as a way of generating model uncertainty estimates. I use Edward, a new probabilistic programming framework extending Python and TensorFlow, for inference on deep neural nets for several benchmark data sets. This is compared with dropout training, which has recently been shown to be formally equivalent to approximate Bayesian inference.
Abstract Deep learning methods represent the state-of-the-art for many applications such as speech recognition, computer vision and natural language processing. Conventional approaches generate point estimates of deep neural network weights and hence make predictions that can be overconfident since they do not account well for uncertainty in model parameters. However, having some means of quantifying the uncertainty of our predictions is often a critical requirement in fields such as medicine, engineering and finance. One natural response is to consider Bayesian methods, which offer a principled way of estimating predictive uncertainty while also showing robustness to overfitting.
Bayesian neural networks have a long history. Exact Bayesian inference on network weights is generally intractable and much work in the 1990s focused on variational and Monte Carlo based approximations [1-3]. However, these suffered from a lack of scalability for modern applications. Recently the field has seen a resurgence of interest, with the aim of constructing practical, scalable techniques for approximate Bayesian inference on more complex models, deep architectures and larger data sets [4-10].
Edward is a new, Turing-complete probabilistic programming language built on Python [11]. Probabilistic programming frameworks typically face a trade-off between the range of models that can be expressed and the efficiency of inference engines. Edward can leverage graph frameworks such as TensorFlow to enable fast distributed training, parallelism, vectorisation, and GPU support, while also allowing composition of both models and inference methods for a greater degree of flexibility.
In this talk I will give a brief overview of developments in Bayesian deep learning and demonstrate results of Bayesian inference on deep architectures implemented in Edward for a range of publicly available data sets. Dropout is an empirical technique which has been very successfully applied to reduce overfitting in deep learning models [12]. Recent work by Gal and Ghahramani [13] has demonstrated a surprising formal equivalence between dropout and approximate Bayesian inference in neural networks. I will compare some results of inference via the machinery of Edward with model averaging over neural nets with dropout training.