Login / Signup
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch.
Shangtong Zhang
Remi Tachet des Combes
Romain Laroche
Published in:
J. Mach. Learn. Res. (2022)
Keyphrases
</>
reinforcement learning
state space
least squares
markov random field
linear programming
sample size
global optimization