Login / Signup
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning.
Christoph Dann
Teodor Vanislavov Marinov
Mehryar Mohri
Julian Zimmert
Published in:
NeurIPS (2021)
Keyphrases
</>
reinforcement learning
function approximation
multi armed bandit
function approximators
state space
markov decision processes
regret bounds
machine learning
learning algorithm
objective function
support vector machine
optimal policy
learning theory
temporal difference