Description
Companies with an artificial-intelligence plan have a differentiating strategy in the intelligence economy; however, implementing robust machine-learning in production is nontrivial, often requiring a close collaboration between data scientists and developers, and retooling the production stack and workflows to develop and maintain accurate models.
Machine learning in production involves model application, handling missing data, data artifacts, and data outside of the training calibration. A rigorous evaluation framework draws upon logging to determine characteristics of model coverage, model performance, auditing, and run-time performance. Model coverage includes the number of times the model produced sensible output relative to number of times it is called. Model coverage is reduced if the model does not converge or model criteria are not met. Model performance is evaluated with a suite of metrics (accuracy, AUC, FPR, TPR, RMSE, MAPE, etc.), which assist in determining the most appropriate model to use in the production scenario and the validity of the model training. Regularly performing manual audits for spot checks is important for debugging and ensuring the model passes sanity checks. Model performance includes run times and profiling model pieces, ensuring performance is within specified requirements and refactoring otherwise.
In the AI renaissance, where ML is a critical piece of intelligent products, seamlessly integrating model evaluation into workflows is an important component of making robust products and building a satisfying customer experience. Python is a great language to build intelligent products with its abundance of ML libraries and wrappers contributed as open-source software in addition to rich full-stack capabilities.