Keyphrases
- markov decision processes
- reward function
- dynamic programming
- average reward
- average cost
- reinforcement learning algorithms
- state space
- reinforcement learning
- optimal policy
- finite horizon
- action sets
- policy iteration
- finite state
- optimality criterion
- discounted reward
- stationary policies
- partially observable
- state action
- action space
- control policy
- infinite horizon
- inverse reinforcement learning
- transition matrices
- total reward
- transition model
- control policies
- optimal control
- markov decision process
- transition probabilities
- approximate dynamic programming
- learning algorithm
- hierarchical reinforcement learning
- state and action spaces
- multi agent
- multiple agents