A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound.
Gal DalalBalázs SzörényiGugan ThoppePublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- state and action spaces
- upper bound
- lower bound
- markov decision processes
- function approximation
- machine learning
- unit length
- worst case
- real valued functions
- reinforcement learning algorithms
- multi agent
- state space
- vc dimension
- neural network
- temporal difference learning
- finite number
- error bounds
- optimal control
- markov decision process
- learning agents
- learning process
- reinforcement learning methods
- optimal policy
- policy search
- learning algorithm
- dynamic programming