Description
PyData Amsterdam 2017
In this talk I'd like to give practical introduction into deep reinforcement learning methods, used to solve complex control problems in robotics, play Atari games, self-driving car control and lots more.
Deep Reinforcement Learning is a very hot topic, successfully applied in lots of areas which require planning of actions in complex, noisy and partially-observed environments. Concrete examples vary from playing arcade games, navigating websites, helicopter, quadrocopter and car control, protein folding and lots of others.
Surprisingly, during my own delving into this wide topic, I've discovered that (with rare exceptions) there is a lack of concrete, understandable explanation of most successful and useful algorithms and methods, such as Deep Q-Networks (DQN), Policy Gradients (PG) and Asynchronous Advantage Actor-Critic (A3C). The situation is even worse with simple code examples of the above methods.
On the one side, there are lots of scientific papers on arxiv.org where researchers tune ideas and methods. On the other side there is a couple full-sized open-source projects implementing those methods plus dozens of "tricks" to improve stability and performance of those methods.
In this talk, I'll try to fill the gap between them by showing the intuition behind the math and demonstrating how those three approaches (DQN, PG and A3C) can be implemented in less than 200 lines of python code using keras.