Policy gradient in Lipschitz Markov Decision Processes.
Matteo PirottaMarcello RestelliLuca BascettaPublished in: Mach. Learn. (2015)
Keyphrases
- markov decision processes
- policy gradient
- reinforcement learning algorithms
- average reward
- reinforcement learning
- partially observable markov decision processes
- actor critic
- policy iteration
- finite state
- optimal policy
- dynamic programming
- state space
- function approximation
- stochastic games
- partially observable
- optimal control
- reinforcement learning methods
- markov decision process
- infinite horizon
- average cost
- action space
- approximation methods
- gradient method
- long run
- learning tasks
- decision problems
- heuristic search
- sufficient conditions
- variance reduction
- control system
- search algorithm
- multi agent