Off-Policy Actor-Critic with Emphatic Weightings.

Eric Graves Ehsan Imani Raksha Kumaraswamy Martha White

Published in: J. Mach. Learn. Res. (2023)

Keyphrases

actor critic
reinforcement learning
optimal control
temporal difference
neuro fuzzy
policy gradient
approximate dynamic programming
gradient method
policy iteration
reinforcement learning algorithms
function approximation
evaluation function
average reward
dynamical systems
neural network
dynamic programming
np hard
learning algorithm
multi agent
model free
fixed point
mathematical model
optimal policy
optimization problems