Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic.

Published in: CoRR (2024)

Keyphrases