Reinforcement Learning with a Corrupted Reward Channel.

Tom Everitt Victoria Krakovna Laurent Orseau Shane Legg

Published in: IJCAI (2017)

Keyphrases

reinforcement learning
function approximation
state space
model free
multi channel
eligibility traces
multi agent

markov decision processes
reinforcement learning algorithms
machine learning
dynamic programming
average reward
temporal difference
learning algorithm

action selection
communication channels
noise free
transfer learning
reinforcement learning methods
partially observable environments
learning classifier systems

reward function
learning problems
markov decision process
function approximators
multiple access
supervised learning
total reward