Login / Signup
Generalized Off-Policy Actor-Critic.
Shangtong Zhang
Wendelin Boehmer
Shimon Whiteson
Published in:
CoRR (2019)
Keyphrases
</>
actor critic
reinforcement learning
temporal difference
optimal control
policy gradient
neuro fuzzy
function approximation
average reward
gradient method
neural network
decision making
dynamic programming
approximate dynamic programming