Description
Once your machine learning POC seems promising and your development environment is set up, the next step is to refactor your code and write TESTS. We know that a lot of people think tests are too complicated and boring to write and they are not very useful. Some manual checks can address the need.
There is more than one way to test. Tests can be split on several levels (unit, component, functional, performances, etc...) to be able to quickly identify the faulty code/data/parameter. Tests must also be automated in a Continuous Integration and run at least on each experiment before merging it in the baseline pipeline as it is done in software engineering (the CI is triggered on each feature branch).
This talk is about how to easily write tests and testable code, how to avoid most common traps and what are the benefits of tests on unrealistic data in your Machine Learning project.
(Tests on real data are also really important but they are not the main purpose of this talk.)
Slides are here: sdg.jlbl.net/slides/tests_for_datascientist/presentation.html
Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?