Technical Program

Paper Detail

Paper: PS-1A.35
Session: Poster Session 1A
Location: Symphony/Overture
Session Time: Thursday, September 6, 16:30 - 18:30
Presentation Time:Thursday, September 6, 16:30 - 18:30
Presentation: Poster
Paper Title: Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards
Manuscript:  Click here to view manuscript
Authors: Paul Krueger, Thomas Griffiths, Princeton University, United States
Abstract: Model-free and model-based reinforcement-learning provide a successful framework for understanding human behavior and neural data. These two systems are usually thought to compete for control of behavior. However, it has also been proposed that they can be integrated cooperatively. The Dyna algorithm uses MB replay of past experience to train the MF system, and has inspired research examining whether human learners do something similar. Here we introduce an approach that links MF and MB learning in a new way: via the reward function. Given a model of the learning environment, dynamic programming is used to iteratively approximate state values that monotonically converge to state values under the optimal decision policy. Pseudorewards are calculated from these values and used to shape the reward function of a MF learner in a way that is guaranteed not to change the optimal policy. We show that this method offers computational advantages over Dyna. It also offers a new way to think about integrating MF and MB RL: our knowledge of the world doesn't just provide a source of simulated experience for training our instincts; it shapes the rewards that those instincts latch onto. We discuss psychological phenomena that this theory could apply to.