Login / Signup

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward.

Taisuke Kobayashi
Published in: CoRR (2023)
Keyphrases