Contribute Media
A thank you to everyone who makes this possible: Read More

An introduction to machine learning on small scale datasets (PyData)

Description

The purpose of this talk is to illustrate the differences between explanatory modelling (classical statistics) and predictive modelling (machine learning) as these two approaches are often conflated. The scikit-learn machine learning library was used to classify Irish farmers who planted forests on their land. The dataset was relatively small providing data on 799 Irish farmers and approximately 135 different variables. Prior to classifying farmers, irrelevant and redundant variables were removed from the dataset using a feature wrapper technique which improves the predictive power of models. This illustrates the power of machine learning for inductive analysis by uncovering previously unknown relationships between variables (features). As the Ipython notebooks were computationally demanding the final code was run on gaia, a high performance computer within UCD using runipy. Earlier versions of the Ipython notebooks were run on Amazon EC2 using StarCluster which makes high performance computing available to the general public at reasonable cost.

Details

Improve this page