A Tale of Sampling and Estimation in Discounted Reinforcement Learning.
Alberto Maria MetelliMirco MuttiMarcello RestelliPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- optimal policy
- dynamic programming
- markov decision process
- average reward
- importance sampling
- estimation error
- function approximation
- reinforcement learning algorithms
- estimation accuracy
- temporal difference learning
- estimation algorithm
- finite state
- finite horizon
- action selection
- multi agent
- partially observable
- reinforcement learning methods
- total reward
- learning algorithm
- sampling algorithm
- temporal difference
- long run
- random sampling
- optimal control
- parameter estimation
- supervised learning