Login / Signup
Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models.
Rui Miao
Zhengling Qi
Xiaoke Zhang
Published in:
NeurIPS (2022)
Keyphrases
</>
policy evaluation
partially observable markov decision processes
markov decision processes
finite state
least squares
model selection
reinforcement learning
optimal policy
parametric models
learning algorithm
lower bound
search space
dynamic programming
supply chain
policy iteration