Episodic Policy Gradient Training.

Hung Le Majid Abdolshah Thommen K. George Kien Do Dung Nguyen Svetha Venkatesh

Published in: AAAI (2022)

Keyphrases

policy gradient
training set
function approximation
parametric optimization
machine learning
multi agent
multi agent systems
support vector machine
average reward
gradient method
actor critic