Description
Title High-Performance Distributed Tensorflow: Request Batching and Model Post-Processing Optimizations
Filmed at PyData London 2017
Description In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.
Abstract In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.
First, we'll use various techniques such as 8-bit quantization, weight-rounding, and batch-normalization folding, we will simplify the path of forward propagation and prediction.
Next, we'll loadtest and compare our optimized and unoptimized models - in addition to enabling and disabling request batching.
Last, we'll dive deep into Google's Tensorflow Graph Transform Tool to build custom model optimization functions.