Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.

Tengyu Xu Yingbin Liang

Published in: AISTATS (2021)

Keyphrases

reinforcement learning algorithms
reinforcement learning
state space
model free
sample complexity
markov decision processes
learning algorithm
partially observable markov decision processes
temporal difference
function approximation
average case
multi agent
dynamic environments
dynamic programming
upper bound
theoretical analysis
support vector machine
learning problems
lower bound
reward function
computational complexity