Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning.

Christoph DannTeodor V. MarinovMehryar MohriJulian Zimmert
Published in: CoRR (2021)