Description
In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.
First, we'll use various techniques such as 8-bit quantization, weight-rounding, and batch-normalization folding, we will simplify the path of forward propagation and prediction.
Next, we'll loadtest and compare our optimized and unoptimized models - in addition to enabling and disabling request batching.
Last, we'll dive deep into Google's Tensorflow Graph Transform Tool to build custom model optimization functions.