High-Performance Distributed Tensorflow: Request Batching and Model Post-Processing Optimizations

YouTube

Description

Title High-Performance Distributed Tensorflow: Request Batching and Model Post-Processing Optimizations

Filmed at PyData London 2017

Description In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.

Abstract In this completely demo-based talk, Chris will demonstrate various techniques to post-process and optimize trained Tensorflow AI models to reduce deployment size and increase prediction performance.

First, we'll use various techniques such as 8-bit quantization, weight-rounding, and batch-normalization folding, we will simplify the path of forward propagation and prediction.

Next, we'll loadtest and compare our optimized and unoptimized models - in addition to enabling and disabling request batching.

Last, we'll dive deep into Google's Tensorflow Graph Transform Tool to build custom model optimization functions.

PyVideo

High-Performance Distributed Tensorflow: Request Batching and Model Post-Processing Optimizations

Description

Details