Description
Scale your applications from a laptop to a cluster with ease
Ray (http://ray.io) is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications from a laptop to a cluster. While broadly applicable, it was developed to solve the unique performance challenges of ML/AI systems, such as the heterogeneous task scheduling and state management required for hyperparameter tuning and model training, running simulations when training reinforcement learning (RL) models, and model serving. Ray is now used in many production deployments.
I'll explain the problems that Ray solves for cluster-wide scaling of general Python applications and for specific examples, like RL workloads. Ray’s features include rapid scheduling and execution of “tasks” and management of distributed state, such as model parameters during training. I'll compare Ray to other libraries for distributed Python.
This talk is for you if you need to scale your Python applications to a cluster and you want a robust, yet easy-to-use API to do it. You don't need to be a distributed systems expert to use Ray. You'll learn when to use Ray versus alternatives, how it’s used in several open source systems, and how to use it in your projects.