Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models.
Rui MiaoZhengling QiXiaoke ZhangPublished in: CoRR (2022)
Keyphrases
- policy evaluation
- partially observable markov decision processes
- reinforcement learning
- markov decision processes
- semi parametric
- dynamic programming
- probabilistic model
- model selection
- optimal policy
- function approximation
- model free
- np hard
- state space
- markov chain
- decision problems
- statistical inference
- multi agent
- learning algorithm