Login / Signup
Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation.
Taisuke Kobayashi
Takumi Aotani
Julio Rogelio Guadarrama-Olvera
Emmanuel C. Dean-Leon
Gordon Cheng
Published in:
ICDL-EPIROB (2019)
Keyphrases
</>
dynamic programming
monte carlo
optimal solution
computational complexity
cost function
path planning
reinforcement learning
objective function
np hard
simulated annealing
convergence proof
temporal difference
average reward
actor critic