Description
www.pydata.org
PyData London
www.pydata.org
Matt Thomas & Natalia Angarita-Jaimes
Description
We will talk about a framework we have developed to use Machine Learning and other advanced analytical methods to reduce risk in the Public Sector. This python-based assurance scoring framework, developed with Pandas & Scikit-Learn, changes the emphasis of traditional risk-scoring frameworks to identifying compliant behaviour; we discuss some of the challenges faced and present a case study.
Abstract
Traditionally, risk scoring frameworks are built around the customer journey to identify non-compliant or fraudulent behaviour. These frameworks combine data from different sources and historical known fraud to identify high risk transactions or applications. In the public sector, however, the emphasis is often on identifying low-risk customers. In this talk we will discuss an Assurance Scoring framework which applies these traditional machine learning and analytics techniques but changes this emphasis and identifies those customers posing minimum risk. The advantage of this approach is that low risk transactions can be automated (which account for the majority of customers) and resources can be focused more effectively to handle those exceptional high risk cases. This framework has been developed in Python, in particular with Pandas and Scikit-Learn. But we also go beyond Machine Learning to incorporate other techniques such as rules based linking, anomaly detection and graph based analysis, and show how these can be used to boost the confidence of the low-risk group. In particular, we will showcase how different python packages have been integrated to address the data pre-processing, feature engineering, model building & validation problems and how we have solved the challenges faced during the integration process by developing a range of testing procedures.
PyData London 2016 PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.
We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.
PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.
We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.