Login / Signup
Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity.
Yan Li
Tuo Zhao
Guanghui Lan
Published in:
CoRR (2022)
Keyphrases
</>
sample complexity
optimal policy
pac learning
upper bound
theoretical analysis
learning problems
training examples
vc dimension
data dependent
generalization error
learning experience
data mining
supervised learning
np hard
special case
active learning
lower bound