Contribute Media
A thank you to everyone who makes this possible: Read More

Deploying ML Solutions With Low Latency In Python


"Deploying ML Solutions With Low Latency In Python" by: Aditya Lohia When we aim for better accuracies, sometimes we forget that the algorithms become more massive and slower. This fact renders the algorithms unusable in real-time scenarios. How do you deploy your solution? Which framework to use? Can you use Python for deploying my solution? Can you use Jetson Nano for multi-stream inferencing? If you are curious to solve these questions, join me in this talk to discover TensorRT and DeepStream and how they reduce your algorithm’s latency and memory footprint. NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. DeepStream offers a multi-platform scalable framework with TLS security to deploy on edge and connect to any cloud. If you are using a GPU and CUDA/Tensor cores, you can leverage the SDK framework to deploy bigger and better algorithms for your real-time scenarios. The main focus of this talk will be to demonstrate why, where, and how to use TensorRT and DeepStream.

Recorded at the 2021 Python Web Conference (


Improve this page