Reinforcement Learning with a Corrupted Reward Channel.
Tom EverittVictoria KrakovnaLaurent OrseauMarcus HutterShane LeggPublished in: CoRR (2017)
Keyphrases
- reinforcement learning
- state space
- multi channel
- reinforcement learning algorithms
- function approximation
- multi agent
- model free
- eligibility traces
- machine learning
- reward function
- supervised learning
- optimal policy
- markov decision processes
- total reward
- multiple access
- temporal difference
- optimal control
- dynamic programming
- learning process
- partially observable environments
- action selection
- channel coding
- learning classifier systems
- learning algorithm
- control policy
- average reward
- markov decision problems
- action space
- agent learns
- learning capabilities