Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity.
Bo LiuIan GempMohammad GhavamzadehJi LiuSridhar MahadevanMarek PetrikPublished in: CoRR (2020)
Keyphrases
- temporal difference learning
- sample complexity
- reinforcement learning
- learning problems
- function approximation
- learning algorithm
- supervised learning
- theoretical analysis
- temporal difference
- reinforcement learning algorithms
- pac learning
- evaluation function
- generalization error
- game playing
- fixed point
- upper bound
- active learning
- special case
- lower bound
- markov decision process
- training examples
- model free
- state space
- function approximators
- sample size
- monte carlo
- markov decision processes
- training data
- learning tasks
- transfer learning
- data sets
- optimal policy
- model selection
- markov random field
- semi supervised
- policy iteration
- data mining