Statistics and Optimization in Reinforcement Learning
Reinforcement Learning (RL) is a mathematical framework to develop intelligent agents that can learn the optimal behaviour that maximizes the cumulative reward by interacting with the environment. There are numerous successful applications in many fields. Statistics and optimization are becoming important tools for RL. In this talk, we will look at two of our recent developments. In the first example, we employ distributional RL for efficient exploration. In distributional RL, the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components: a decaying schedule to suppress the intrinsic uncertainty and an exploration bonus calculated from the upper quantiles of the learned distribution. In the second example, we study damped Anderson mixing for deep RL. Anderson mixing has been heuristically applied to RL algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Motivated by that, we provide a rigorous mathematical justification for the benefits of Anderson mixing in RL. Our main results establish a connection between Anderson mixing and quasi-Newton methods, prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor, and propose a stabilization strategy. Besides the two examples, we will discuss some current progress and future directions on statistics and optimization in RL.