High Performance Distributed Tensorflow

YouTube

Description

In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.

First, we'll use various techniques such as 8-bit quantization, weight-rounding, and batch-normalization folding, we will simplify the path of forward propagation and prediction.

Next, we'll loadtest and compare our optimized and unoptimized models - in addition to enabling and disabling request batching.

Last, we'll dive deep into Google's Tensorflow Graph Transform Tool to build custom model optimization functions.

PyVideo

High Performance Distributed Tensorflow

Description

Details