Policy Evaluation with Delayed, Aggregated Anonymous Feedback.
Guilherme Dinis JuniorSindri MagnússonJaakko HollménPublished in: DS (2022)
Keyphrases
- policy evaluation
- least squares
- temporal difference
- reinforcement learning
- monte carlo
- model free
- matrix inversion
- markov decision processes
- function approximation
- policy iteration
- variance reduction
- semi parametric
- statistical inference
- evaluation function
- optimal policy
- dynamical systems
- utility function
- regression model