Off-Policy Evaluation in Partially Observed Markov Decision Processes.
Yuchen HuStefan WagerPublished in: CoRR (2021)
Keyphrases
- partially observed
- policy evaluation
- markov decision processes
- policy iteration
- reinforcement learning
- state space
- optimal policy
- dynamic programming
- finite state
- planning under uncertainty
- average reward
- partially observable
- least squares
- reinforcement learning algorithms
- average cost
- markov decision process
- decision processes
- model free
- infinite horizon
- heuristic search
- variance reduction
- temporal difference
- partially observable markov decision processes
- fixed point
- markov chain
- search space