Login / Signup
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning.
Zihan Zhang
Yuhang Jiang
Yuan Zhou
Xiangyang Ji
Published in:
CoRR (2022)
Keyphrases
</>
reinforcement learning
regret bounds
multi armed bandit
learning algorithm
state space
markov decision processes
optimal policy
machine learning
lower bound
active learning
upper bound