Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes.
Chuhan XieWenhao YangZhihua ZhangPublished in: ICML (2023)
Keyphrases
- markov decision processes
- policy evaluation
- policy iteration
- least squares
- reinforcement learning
- finite state
- state space
- optimal policy
- dynamic programming
- model free
- temporal difference
- monte carlo
- semi parametric
- average reward
- partially observable
- average cost
- markov decision process
- decision processes
- variance reduction
- reinforcement learning algorithms
- function approximation
- action space
- reward function
- linear model
- planning under uncertainty
- markov decision problems
- linear programming