Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs.

Weichao Mao Kaiqing Zhang Ruihao Zhu David Simchi-Levi Tamer Basar

Published in: CoRR (2020)

Keyphrases

non stationary
model free
reinforcement learning
regret bounds
policy iteration
markov decision processes
policy evaluation
average reward
reinforcement learning algorithms
function approximation
online learning
lower bound
rl algorithms
linear regression
state space
temporal difference
upper bound
optimal policy
markov decision problems
machine learning
empirical mode decomposition
learning algorithm
markov decision process
partially observable markov decision processes
dynamic programming
learning process
action space
optimal control