Soft-Robust Actor-Critic Policy-Gradient.

Esther Derman Daniel J. Mankowitz Timothy A. Mann Shie Mannor

Published in: CoRR (2018)

Keyphrases

policy gradient
actor critic
reinforcement learning
optimal control
gradient method
function approximation
temporal difference
policy gradient methods
reinforcement learning algorithms
neuro fuzzy
approximate dynamic programming
average reward
variance reduction
approximation methods
neural network
markov decision processes
optimization method
single agent
partially observable markov decision processes
least squares
multi agent systems
natural actor critic