Description
Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. In my talk I will elaborate on importance of reproducibility and show how we build reproducible machine learning pipelines at Netguru.
Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. Although, reproducibility in building machine learning papers seems to be must-have, it's still not a standard.
Outline of talk:
- Definitions:
- reproducibility
 - replicability
 - generalisability
 
 - Motivation for achieving reproducibility
 - Full reproducibility == Continuous Delivery for ML
 - Changes in ML development process
- code
 - data
 - models
 
 - How we managing change in ML development process?
 - Data versioning
- Quilt Data
 
 - Experiments management
- MLFlow / Polyaxon
 
 - Summary