Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.
Tengyu XuYingbin LiangPublished in: AISTATS (2021)
Keyphrases
- reinforcement learning algorithms
- reinforcement learning
- state space
- model free
- sample complexity
- markov decision processes
- learning algorithm
- partially observable markov decision processes
- temporal difference
- function approximation
- average case
- multi agent
- dynamic environments
- dynamic programming
- upper bound
- theoretical analysis
- support vector machine
- learning problems
- lower bound
- reward function
- computational complexity