Login / Signup
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants.
Zaiwei Chen
Siva Theja Maguluri
Sanjay Shakkottai
Karthikeyan Shanmugam
Published in:
CoRR (2021)
Keyphrases
</>
td learning
finite sample
temporal difference
sample size
evaluation function
function approximation
nearest neighbor
reinforcement learning
reinforcement learning algorithms
monte carlo
generalization error
model free
action selection
state space
linear combination
neural network
decision making