Login / Signup
Learning a dynamic policy by using policy gradient: application to biped walking.
Takamitsu Matsubara
Jun Morimoto
Jun Nakanishi
Masa-aki Sato
Kenji Doya
Published in:
Systems and Computers in Japan (2007)
Keyphrases
</>
policy gradient
actor critic
biped walking
reinforcement learning
model free reinforcement learning
learning algorithm
policy search
policy gradient methods
optimal policy
optimal control
gradient method
neural network
function approximation
average reward