Off-policy evaluation for tabular reinforcement learning with synthetic trajectories.
Weiwei WangYuqiang LiXianyi WuPublished in: Stat. Comput. (2024)
Keyphrases
- policy evaluation
- reinforcement learning
- temporal difference
- least squares
- model free
- markov decision processes
- function approximation
- policy iteration
- monte carlo
- variance reduction
- td learning
- state space
- reinforcement learning algorithms
- moving objects
- semi parametric
- optimal policy
- evaluation function
- multi agent
- statistical inference
- action selection
- bayesian inference
- reinforcement learning methods
- graphical models
- optical flow