Login / Signup
Softmax Deep Double Deterministic Policy Gradients.
Ling Pan
Qingpeng Cai
Longbo Huang
Published in:
NeurIPS (2020)
Keyphrases
</>
fluid model
black box
optimal policy
markov decision process
fully observable
action selection
objective function
policy iteration
temporal difference learning
policy making