Session Detail Information
Add this session to your itinerary

Cluster :  INFORMS Computing Society

Session Information  : Monday Nov 08, 08:00 - 09:30

Title:  Advances in Approximate Dynamic Programming and Reinforcement Learning - I
Chair: Marek Petrik,University of Massachusetts Amherst, 140 Governors Drive, Amherst MA 01003, United States of America, petrik@cs.umass.edu

Abstract Details

Title: Recent Progress in Off-policy Reinforcement Learning
 Presenting Author: Csaba Szepesvari,University of Alberta, Athabasca Hall 3-11, Edmonton, Canada, szepesva@ualberta.ca
 
Abstract: Temporal-difference (TD) based reinforcement learning (RL) algorithms might diverge when used in off-policy settings, seriously limiting their applicability. Recently, Sutton, Szepesvari and Maei introduced a new trick which allows one to design new RL algorithms which inherit the desirable properties of TD algorithms, while avoid their problems. In this talk I will talk about this trick and how it can be used to design new algorithms in reinforcement learning and beyond.
  
Title: Optimal Stepsizes in Approximate Value Iteration
 Presenting Author: Ilya Ryzhov,Operations Research and Financial Engineering, Princeton University, Princeton NJ 08540, United States of America, iryzhov@princeton.edu
 Co-Author: Peter Frazier,Assistant Professor, Cornell University, 232 Rhodes Hall, Ithaca NY 14853, United States of America, pf98@cornell.edu
 Warren Powell,Professor, Princeton University, Sherrerd Hall, Princeton NJ 08544, United States of America, powell@princeton.edu
 
Abstract: In approximate dynamic programming, random observations are smoothed to estimate the value of being in a state, leading to the problem of choosing a stepsize. We propose a new stepsize rule that is optimal for a basic ADP problem. This is the first rule to explicitly consider the covariance between the current observation and the previous prediction in ADP. We extend the rule to general ADP settings; experimental results show that it produces fast convergence without the need for tuning.
  
Title: Optimization-based Algorithms for Approximate Dynamic Programming
 Presenting Author: Marek Petrik,University of Massachusetts Amherst, 140 Governors Drive, Amherst MA 01003, United States of America, petrik@cs.umass.edu
 Co-Author: Shlomo Zilberstein,University of Massachusetts Amherst, 140 Governors Drive, Amherst MA 01003, United States of America, shlomo@cs.umass.edu
 
Abstract: Most ADP algorithms iteratively approximate the value function. Although these algorithms can achieve impressive results in some domains, they often require extensive parameter tweaking to work well. We present new more reliable algorithms, which use optimization instead of iterative improvement. These optimization-based algorithms are easy to analyze and offer much stronger guarantees. We present experimental results on management of water discharge from a dam.