Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods.

Qing Li Wengang Zhou Zhenbo Lu Houqiang Li

Published in: CoRR (2022)

Keyphrases

reinforcement learning
actor critic
learning algorithm
reinforcement learning methods
learning tasks
function approximation
action selection
reinforcement learning algorithms
state space
temporal difference
function approximators
gradient method
cost function
temporal difference learning