Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank.
Kefan DongJian PengYining WangYuan ZhouPublished in: COLT (2020)
Keyphrases
- reinforcement learning
- function approximation
- markov decision processes
- temporal difference learning
- reinforcement learning algorithms
- learning tasks
- actor critic
- average reward
- state space
- partially observable
- learning process
- learning algorithm
- optimal policy
- supervised learning
- temporal difference
- markov decision process
- policy iteration
- stochastic games
- loss function
- policy evaluation
- reinforcement learning methods
- least squares
- markov chain
- neural network
- real time dynamic programming
- finite state