Linglong Kong

Thursday, February 24 2022, 4pm

Zoom

Linglong Kong

Mathematical and Statistical Sciences

University of Alberta

Statistics and Optimization in Reinforcement Learning

Reinforcement Learning (RL) is a mathematical framework to develop intelligent agents that can learn the optimal behaviour that maximizes the cumulative reward by interacting with the environment. There are numerous successful applications in many fields. Statistics and optimization are becoming important tools for RL. In this talk, we will look at two of our recent developments. In the first example, we employ distributional RL for efficient exploration. In distributional RL, the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components: a decaying schedule to suppress the intrinsic uncertainty and an exploration bonus calculated from the upper quantiles of the learned distribution. In the second example, we study damped Anderson mixing for deep RL. Anderson mixing has been heuristically applied to RL algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Motivated by that, we provide a rigorous mathematical justification for the benefits of Anderson mixing in RL. Our main results establish a connection between Anderson mixing and quasi-Newton methods, prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor, and propose a stabilization strategy. Besides the two examples, we will discuss some current progress and future directions on statistics and optimization in RL.

Support us

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.

Slideshow

Linglong Kong

Statistics and Optimization in Reinforcement Learning

Support us