Login / Signup

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic.

Bhrij PatelWesley A. SuttleAlec KoppelVaneet AggarwalBrian M. SadlerAmrit Singh BediDinesh Manocha
Published in: CoRR (2024)
Keyphrases