Login / Signup
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.
Shangtong Zhang
Remi Tachet des Combes
Romain Laroche
Published in:
CoRR (2021)
Keyphrases
</>
state space
reinforcement learning
feature space
evolutionary algorithm
special case
sample size