Description
Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. In my talk I will elaborate on importance of reproducibility and show how we build reproducible machine learning pipelines at Netguru.
Reproducibility is a cornerstone of scientific methods. Especially in production Machine Learning it's crucial to ensure that hidden source of randomness is not a real reason for a model performance improvement. Although, reproducibility in building machine learning papers seems to be must-have, it's still not a standard.
Outline of talk:
- Definitions:
- reproducibility
- replicability
- generalisability
- Motivation for achieving reproducibility
- Full reproducibility == Continuous Delivery for ML
- Changes in ML development process
- code
- data
- models
- How we managing change in ML development process?
- Data versioning
- Quilt Data
- Experiments management
- MLFlow / Polyaxon
- Summary