√n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank.

Kefan Dong Jian Peng Yining Wang Yuan Zhou

Published in: CoRR (2019)

Keyphrases

reinforcement learning
function approximation
markov decision processes
temporal difference learning
reinforcement learning algorithms
learning tasks
learning algorithm
function approximators
partially observable
actor critic
optimal policy
model free
state action
stochastic games
state space
total reward
markov decision process
supervised learning
learning process
finite state
real time dynamic programming
temporal difference
machine learning
action selection
monte carlo
reinforcement learning methods
artificial neural networks
data mining