A Tale of Sampling and Estimation in Discounted Reinforcement Learning.

Alberto Maria Metelli Mirco Mutti Marcello Restelli

Published in: CoRR (2023)

Keyphrases

reinforcement learning
markov decision processes
optimal policy
dynamic programming
markov decision process
average reward
importance sampling
estimation error
function approximation
reinforcement learning algorithms
estimation accuracy
temporal difference learning
estimation algorithm
finite state
finite horizon
action selection
multi agent
partially observable
reinforcement learning methods
total reward
learning algorithm
sampling algorithm
temporal difference
long run
random sampling
optimal control
parameter estimation
supervised learning