Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward.

Taisuke Kobayashi

Published in: CoRR (2023)

Keyphrases

temporal difference learning
reinforcement learning
function approximation
fixed point
state space
temporal difference
state variables
markov decision process
neural network
training set
markov decision processes
evaluation function
function approximators