Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards.

Semih Cayci Atilla Eryilmaz

Published in: NeurIPS (2023)

Keyphrases

temporal difference learning
heavy tailed
reinforcement learning
function approximation
fixed point
evaluation function
game playing
temporal difference
reinforcement learning algorithms
machine learning
markov decision processes
training data
dynamic programming
worst case
markov decision process