Sign in

On the Convergence of Policy Gradient in Robust MDPs.

Qiuhao WangChin Pang HoMarek Petrik
Published in: CoRR (2022)
Keyphrases
  • policy gradient
  • reinforcement learning
  • policy search
  • markov decision processes
  • computational complexity
  • reinforcement learning algorithms
  • average reward
  • gradient method