Login / Signup

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning.

Zihan ZhangYuhang JiangYuan ZhouXiangyang Ji
Published in: CoRR (2022)
Keyphrases
  • reinforcement learning
  • regret bounds
  • multi armed bandit
  • learning algorithm
  • state space
  • markov decision processes
  • optimal policy
  • machine learning
  • lower bound
  • active learning
  • upper bound