Daniel Russo - Global Optimality Guarantees for Policy Gradient Methods

From Katie Gentilello March 13th, 2020

37 plays 0 comments

Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll also discuss (1) convergence rates that follow as a consequence of this theory and (2) consequences of this theory for policy gradient performed with highly expressive policy classes.

* This talk is based on ongoing joint work with Jalaj Bhandari.

Tags: machine learningml@gt seminar seriesgeorgia techrecorded by gtlibraryspring 2020

name: Daniel Russo
Date: March 11th, 2020
Appears In: Open Scholarship

Comments
Related Media

Add a comment

Daniel Russo - Global Optimality Guarantees for Policy Gradient Methods

Related Media