Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance.
W. Bradley KnoxPeter StonePublished in: Artif. Intell. (2015)
Keyphrases
- reinforcement learning
- function approximation
- markov decision processes
- reward function
- reinforcement learning algorithms
- machine learning
- state space
- eligibility traces
- optimal policy
- model free
- average reward
- learning algorithm
- partially observable environments
- learning agent
- temporal data
- behavioural cloning
- human subjects
- temporal difference
- temporal information
- partially observable
- spatial and temporal
- optimal control
- control policy
- temporal constraints
- multi agent
- total reward
- video sequences
- inverse reinforcement learning
- policy gradient
- temporal abstractions
- hidden markov models
- scale spaces
- policy iteration
- partially observable markov decision processes
- finite state
- long run