Contribute Media
A thank you to everyone who makes this possible: Read More

Deep model serving - scale and ergonomics

Description

A serving system for Deep Learning models is a tricky design problem. It's a large scale production system, so we want it to scale well, adapt to changing traffic patterns, and have low latency. It’s also part of the Data Scientist’s core loop - so it should be very flexible, and running an experiment on live traffic should be easy. In this talk, I’ll discuss key design considerations for such a system covering both perspectives. I’ll also describe a system we built at Taboola for serving TensorFlow models. It serves billions of requests per day, spread over dozens of models, and still has pretty good ergonomics,

Details

Improve this page