Login / Signup
Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward.
Taisuke Kobayashi
Published in:
CoRR (2023)
Keyphrases
</>
temporal difference learning
reinforcement learning
function approximation
fixed point
state space
temporal difference
state variables
markov decision process
neural network
training set
markov decision processes
evaluation function
function approximators