Reinforcement learning from simultaneous human and MDP reward.
W. Bradley KnoxPeter StonePublished in: AAMAS (2012)
Keyphrases
- reinforcement learning
- markov decision processes
- reward function
- state space
- optimal policy
- function approximation
- markov decision process
- average reward
- temporal difference
- eligibility traces
- human subjects
- total reward
- reinforcement learning algorithms
- learning algorithm
- action sets
- policy iteration
- state and action spaces
- model free
- machine learning
- multi agent
- discounted reward
- reward signal
- partially observable markov decision processes
- dynamic programming
- learning process
- policy gradient
- bayesian reinforcement learning
- behavioural cloning
- hierarchical reinforcement learning
- dynamic programming algorithms
- learning agent
- partially observable
- learning capabilities
- long run
- optimal control
- markov chain