A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes.
Thomas FurmstonDavid BarberPublished in: NIPS (2012)
Keyphrases
- markov decision processes
- reinforcement learning
- optimal policy
- finite state
- state space
- dynamic programming
- policy iteration
- partially observable
- decision processes
- infinite horizon
- reinforcement learning algorithms
- action space
- reward function
- average cost
- average reward
- markov decision process
- finite horizon
- planning under uncertainty
- reinforcement learning methods