Loss Dynamics of Temporal Difference Reinforcement Learning.

Blake Bordelon Paul Masset Henry Kuo Cengiz Pehlevan

Published in: NeurIPS (2023)

Keyphrases

temporal difference
reinforcement learning
function approximation
td learning
evaluation function
model free
temporal difference learning
reinforcement learning algorithms
policy evaluation
monte carlo
actor critic
temporal difference methods
function approximators
step size
dynamical systems
action selection
policy iteration
state space
supervised learning
learning algorithm
optimal policy
evolutionary algorithm
policy search
reinforcement learning problems
data mining
neural network