Login / Signup
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning.
Zihan Zhang
Yuhang Jiang
Yuan Zhou
Xiangyang Ji
Published in:
NeurIPS (2022)
Keyphrases
</>
reinforcement learning
regret bounds
multi armed bandit
state space
machine learning
learning algorithm
learning process
upper bound
multi class
linear regression
model free
markov decision processes
temporal difference