Login / Signup
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs.
Yi Wan
Richard S. Sutton
Published in:
CoRR (2022)
Keyphrases
</>
markov decision processes
average reward
policy iteration
optimal policy
reinforcement learning
computational complexity
semi markov decision processes
learning algorithm
learning tasks
convergence speed
model free
factored mdps