Description
As data scientists we usually like to apply fancy machine learning models to well-groomed datasets. Everyone working on industrial problems will eventually learn, that this does not reflect reality. The amount of time spent on modeling is small compared to data gathering, -warehousing and -cleaning. Even after training and deployment of the model, the work is not done. Continuous monitoring of the performance and input data is still necessary.
In this talk I discuss how important data handling is for successful data science projects. Each milestone, from finding the business case to continuously monitoring the performance of the solution, is addressed. This is exemplary shown on a project, with the goal of improving a productive system.