Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank.

Kefan Dong Jian Peng Yining Wang Yuan Zhou

Published in: COLT (2020)

Keyphrases

reinforcement learning
function approximation
markov decision processes
temporal difference learning
reinforcement learning algorithms
learning tasks
actor critic
average reward
state space
partially observable
learning process
learning algorithm
optimal policy
supervised learning
temporal difference
markov decision process
policy iteration
stochastic games
loss function
policy evaluation
reinforcement learning methods
least squares
markov chain
neural network
real time dynamic programming
finite state