Learning a dynamic policy by using policy gradient: application to biped walking.

Published in: Systems and Computers in Japan (2007)

Keyphrases