Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes.
Andrew BennettNathan KallusMiruna OprescuWen SunKaiwen WangPublished in: CoRR (2024)
Keyphrases
- markov decision processes
- policy evaluation
- policy iteration
- optimal policy
- least squares
- finite state
- state space
- dynamic programming
- reinforcement learning
- monte carlo
- reinforcement learning algorithms
- model free
- temporal difference
- planning under uncertainty
- decision processes
- average reward
- infinite horizon
- partially observable
- average cost
- reward function
- partially observable markov decision processes
- linear programming
- action space
- variance reduction
- function approximation