On Double Descent in Reinforcement Learning with LSTD and Random Features.

David Brellmann Eloïse Berthier David Filliat Goran Frehse

Published in: CoRR (2023)

Keyphrases

reinforcement learning
temporal difference
function approximation
state space
learning algorithm
policy evaluation
optimal policy
markov decision processes
temporal difference learning
least squares
monte carlo
model free
multi agent
control problems
td learning
action selection
reinforcement learning methods