Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient.
Samuele TosattoJoão CarvalhoJan PetersPublished in: IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- model free reinforcement learning
- optimal control
- policy gradient methods
- multi agent
- markov decision processes
- state action
- gradient method
- reinforcement learning methods
- action selection
- temporal difference
- state space
- learning algorithm
- average reward
- partially observable markov decision processes
- supervised learning
- model free
- approximate dynamic programming
- approximation methods
- neural network
- variance reduction
- sufficient conditions
- function approximators
- single agent
- markov chain
- optimal policy