Keyphrases
- total reward
- reinforcement learning
- markov decision processes
- optimal policy
- average reward
- reinforcement learning algorithms
- action selection
- infinite horizon
- long run
- state space
- decision problems
- model free
- temporal difference
- dynamic programming
- partially observable markov decision processes
- optimality criterion
- optimal control
- function approximation
- policy iteration
- cost function
- search space