Deterministic Value-Policy Gradients.

Qingpeng Cai Ling Pan Pingzhong Tang

Published in: AAAI (2020)

Keyphrases

fluid model
optimal policy
black box
fully observable
neural network
relaxation algorithm
action selection
image gradient
real time
information retrieval
case study
multi agent
lower bound
least squares
control policies