Contribute Media
A thank you to everyone who makes this possible: Read More

The Shogun Machine Learning Toolbox

Summary

We present the Shogun Machine Learning Toolbox, a framework for Machine Learning, which is the art of finding structure in data, with applications in object recognition, brain-computer interfaces, robotics, stock-prices prediction, etc. We give a gentle introduction to ML and Shogun's Python interface, focussing on intuition and visualisation.

Description

We present the Shogun Machine Learning Toolbox, a unified framework for Machine Learning algorithms. Machine Learning (ML) is the art of finding structure in data in an automated way and has given rise to a wide range of applications such as recommendation systems, object recognition, brain-computer interfaces, robotics, predicting stock prices, etc.

Our toolbox offers extensive bindings with other software and computing languages, Python being the major target. The library was initiated in 1999 and remained under heavy development henceforth. In addition to its mature core-framework, Shogun offers state-of-the-art techniques based on latest ML research. This is partly made possible by the 21 Google Summer of Code projects (5+8+8 since 2011) that our students successfully completed. Shogun's codebase has >20k commits made by >100 contributors representing >500k lines of code. While its core is written in C++, a unique of technique for generating interfaces allows usage from a wide range of target languages -- under the same syntax. This includes in particular Python, but also Matlab/Octave, Java, C#, R, ruby, and more. We believe that users should be able to choose their favourite language rather than us dictating this choice. The same applies for supported OS (Linux, Mac, Win). Shogun is part of Debian Linux.

Features of Shogun include most classical ML methods such as classification, regression, dimensionality reduction, clustering, etc, most of them in different flavours. All implemented algorithms in Shogun work on a modular data representation, which allows to easily switch between different sorts of objects as for example strings or matrices. Common ML-tasks and data IO can be carried under a unified interface. This is also true for the various external open-source libraries that are embedded within Shogun.

Code examples are provided for all implemented algorithms. The main and most complete set of examples is in the Python language. In addition, in order to push usage of Shogun in education at universities, we recently started adding more illustrative IPython notebooks. A growing list of statically rendered versions are readily available from our website and implement a cross-over of tutorial-style explanations, code, and visualization examples. We even took this up a notch and started building our own IPython-notebook server with Shogun installed in the cloud at (try cloud button in notebook view) . This allows users to try Shogun without installation via the IPython notebook web interface. All example notebooks can be loaded, interactively modified, and executed. In addition, using the Python Django framework, we built a collection of interactive web-demos where users can play around with basic ML algorithms, demos

In the proposed talk, we will give a gentle and general introduction to ML and the core functionality of Shogun, with a focus on its Python interface. This includes solving basic ML tasks such as classification and regression and some of the more recent features, such as last year's GSoC projects and their IPython notebook writeups. ML material will be presented with a focus on intuition and visualisation and no previous familiarity with ML methods is required.

Key points in the talk

  • What are the goals in ML?
  • Example problems in ML (classification, regression, clustering)
  • Some basic algorithm ideas
  • Focus on Visualisation, not Maths

Intended Audience

  • All people dealing with data (data scientists, big-data hackers) who are looking for tools to deal with it
  • People with a general interest but no education in Machine Learning
  • People interested in the technology behind Shogun (swig, cloud notebook server, web-demos)
  • People from the ML community (scipy-stack)
  • ML scientists/Statisticians

Code examples

Slide examples

See our Europython 2010 slides. Although we aim for more pictures and less formulas this year.

Improve this page