Variational Policy Gradient Method for Reinforcement Learning with General Utilities.
Junyu ZhangAlec KoppelAmrit Singh BediCsaba SzepesváriMengdi WangPublished in: NeurIPS (2020)
Keyphrases
- gradient method
- reinforcement learning
- actor critic
- policy gradient
- optimal policy
- image segmentation
- convex formulation
- convergence rate
- state space
- action space
- optimization algorithm
- markov decision processes
- optimization methods
- optimal control
- action selection
- high dimensional
- optimal solution
- multiscale
- genetic algorithm