Reinforcement Learning with a Corrupted Reward Channel.
Tom EverittVictoria KrakovnaLaurent OrseauShane LeggPublished in: IJCAI (2017)
Keyphrases
- reinforcement learning
- function approximation
- state space
- model free
- multi channel
- eligibility traces
- multi agent
- markov decision processes
- reinforcement learning algorithms
- machine learning
- dynamic programming
- average reward
- temporal difference
- learning algorithm
- action selection
- communication channels
- noise free
- transfer learning
- reinforcement learning methods
- partially observable environments
- learning classifier systems
- reward function
- learning problems
- markov decision process
- function approximators
- multiple access
- supervised learning
- total reward