Login / Signup
Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity.
Yan Li
Guanghui Lan
Tuo Zhao
Published in:
Math. Program. (2024)
Keyphrases
</>
sample complexity
supervised learning
optimal policy
theoretical analysis
data sets
special case
generalization error
pac learning
covering numbers