Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.
Tengyu XuYingbin LiangPublished in: CoRR (2020)
Keyphrases
- reinforcement learning algorithms
- reinforcement learning
- markov decision processes
- model free
- sample complexity
- state space
- learning algorithm
- partially observable markov decision processes
- temporal difference
- function approximation
- dynamic environments
- average case
- reward function
- dynamic programming
- least squares
- supervised learning
- sufficient conditions
- optimal policy
- generalization error
- pairwise
- multi agent
- decision trees
- machine learning