Contribute Media
A thank you to everyone who makes this possible: Read More

Is that spam in my ham?


Lorena Mesa - Is that spam in my ham? [EuroPython 2016] [18 July 2016] [Bilbao, Euskadi, Spain] (

Beginning programmers or Python beginners may find it overwhelming to implement a machine learning algorithm. Increasingly machine learning is becoming more applicable to many areas. This talk introduces key concepts and ideas and uses Python to build a basic classifier - a common type of machine learning problem. Providing some jargon to help those that may be self-educated or currently learning

Supervised learning, machine learning, classifiers, big data! What in the world are all of these things? As a beginning programmer the questions described as "machine learning" questions can be mystifying at best.

In this talk I will define the scope of a machine learning problem, identifying an email as ham or spam, from the perspective of a beginner (non master of all things "machine learning") and show how Python can help us simply learn how to classify a piece of email.

To begin we must ask, what is spam? How do I know it "when I see it"? From previous experience of course! We will provide human labeled examples of spam to our model for it to understand the likelihood of spam or ham. This approach, using examples and data we already know to determine the most likely label for a new example, uses the Naive Bayes classifier.

Our model will look at the words in the body of an email, finding the frequency of words in both spam and ham emails and the frequency of spam and ham. Once we know the prior likelihood of spam and what makes something spam, we can try applying a label to a new example.

Through this exercise we will see at a basic level what types of questions machine learning asks, learn to model "learning" with Python, and understand how learning can be measured.

Improve this page