Description
Machine learning is all over the place. In the cyber space, it's one of the biggest buzzwords of 2015 and most prominent tools used to detect anomalies, in particular when involved in User Behavior Analysis. This talk is a journey inside the cyber security techniques put in place at CloudLock to detect anomalies using Python. We will start our journey by looking at the data collection and processing implemented mostly inside celery tasks in order to produce a clean and enriched feed of billions of events daily. We will then present our use of Apache Spark MLLib with python and how we leverage AWS Lambda and EMR in our flexible infrastructure. Finally, we will show an example of data visualization around our threat map. We will also discuss the pros and cons of using Python with Apache Spark when dealing with scale as well as Docker as a way to encapsulate some parts of the data pipeline and to serve the models to the application. Come learn how to build an efficient ETL at scale using Python all the way!
Slides available here: Deep dive into threat detection in the cloud with Spark and Python by David Melamed