Login / Signup

Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity.

Yan LiGuanghui LanTuo Zhao
Published in: Math. Program. (2024)
Keyphrases
  • sample complexity
  • supervised learning
  • optimal policy
  • theoretical analysis
  • data sets
  • special case
  • generalization error
  • pac learning
  • covering numbers