Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity.
Bo LiuIan GempMohammad GhavamzadehJi LiuSridhar MahadevanMarek PetrikPublished in: J. Artif. Intell. Res. (2018)
Keyphrases
- temporal difference learning
- sample complexity
- reinforcement learning
- learning problems
- function approximation
- learning algorithm
- supervised learning
- temporal difference
- theoretical analysis
- fixed point
- reinforcement learning algorithms
- upper bound
- active learning
- game playing
- pac learning
- special case
- function approximators
- generalization error
- lower bound
- evaluation function
- training examples
- machine learning
- markov decision process
- multi agent
- sample size
- state space
- training data
- model free
- learning tasks
- markov decision processes
- optimal policy
- machine learning algorithms
- model selection
- partially observable