Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity.

Yan Li Tuo Zhao Guanghui Lan

Published in: CoRR (2022)

Keyphrases

sample complexity
optimal policy
pac learning
upper bound
theoretical analysis
learning problems
training examples
vc dimension
data dependent
generalization error
learning experience
data mining
supervised learning
np hard
special case
active learning
lower bound