Login / Signup
Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient.
Dongming Wu
Xingping Dong
Jianbing Shen
Steven C. H. Hoi
Published in:
IEEE Trans. Neural Networks Learn. Syst. (2020)
Keyphrases
</>
policy gradient
variance reduction
estimation error
parametric optimization
reinforcement learning
actor critic
neural network
monte carlo
dynamic environments
parameter estimation
mathematical model
evaluation function
function approximation
average cost
importance sampling
gradient method