Soft-Robust Actor-Critic Policy-Gradient.
Esther DermanDaniel J. MankowitzTimothy A. MannShie MannorPublished in: UAI (2018)
Keyphrases
- policy gradient
- actor critic
- reinforcement learning
- optimal control
- gradient method
- function approximation
- neuro fuzzy
- reinforcement learning algorithms
- approximate dynamic programming
- temporal difference
- policy gradient methods
- approximation methods
- neural network
- average reward
- state action
- variance reduction
- policy iteration
- model free
- linear programming