Policy Gradient using Weak Derivatives for Reinforcement Learning.

Sujay Bhatt Alec Koppel Vikram Krishnamurthy

Published in: CISS (2019)

Keyphrases

policy gradient
reinforcement learning
actor critic
function approximation
reinforcement learning algorithms
policy search
optimal control
gradient method
policy gradient methods
model free
reinforcement learning methods
model free reinforcement learning
state space
average reward
single agent
state action
partially observable markov decision processes
approximation methods
function approximators
machine learning
policy iteration
markov decision process
variance reduction
temporal difference
markov decision processes
optimal policy
dynamic programming
multi agent
learning algorithm