C
search
search
reviewers
reviewers
feeds
feeds
assignments
assignments
settings
logout
Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient.
Dongming Wu
Xingping Dong
Jianbing Shen
Steven C. H. Hoi
Published in:
IEEE Trans. Neural Networks Learn. Syst. (2020)
Keyphrases
</>
policy gradient
variance reduction
estimation error
parametric optimization
reinforcement learning
actor critic
neural network
monte carlo
dynamic environments
parameter estimation
mathematical model
evaluation function
function approximation
average cost
importance sampling
gradient method