Login / Signup
Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning.
Sheng Zhang
Zhe Zhang
Siva Theja Maguluri
Published in:
NeurIPS (2021)
Keyphrases
</>
td learning
average reward
temporal difference
evaluation function
optimal policy
reinforcement learning
function approximation
model free
neural network
data mining
machine learning
nearest neighbor
reinforcement learning algorithms