Developing machine learning models is easy; training, deploying, monitoring, scaling, and maintaining them in an automated fashion - all while maintaining your sanity - is hard.
In this session, I'll discuss the infrastructure and tooling my small team of data science practitioners and engineers is using to manage and orchestrate the machine learning model lifecycle, including pitfalls we've encountered along the way. Particular attention will be paid to where we've opted to use off-the-shelf solutions versus developing our own, the importance of developer ergonomics, and how to maximally empower data scientists to get their work into production without the need for a dedicated MLOps team.
The talk will cover our ML stack as it exists in production today, and will touch on our application of a number of technologies and techniques, including: - AWS SageMaker - Airflow - Docker - Cookiecutter - Property-based testing - Jsonschema - Linting - Slack integration - Model artifacts and diagnostics - Automated deployments and rollbacks - Healthchecks - Autoscaling - DBT
At the end of the session, attendees should expect to leave with new insights that they can apply immediately to their own ML systems and infrastructure, as well as a better understanding of how to minimize engineering and ops overhead, in the real world, across data science teams of any size and composition.