Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs.

Published in: CoRR (2021)

Keyphrases