Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders.
Andrew BennettNathan KallusLihong LiAli MousaviPublished in: AISTATS (2021)
Keyphrases
- policy evaluation
- infinite horizon
- policy iteration
- markov decision processes
- reinforcement learning
- optimal policy
- finite horizon
- state space
- optimal control
- temporal difference
- markov decision process
- partially observable markov decision processes
- dynamic programming
- partially observable
- long run
- least squares
- finite state
- function approximation
- model free
- markov decision problems
- average cost
- reinforcement learning algorithms
- average reward
- latent variables
- decision problems
- learning algorithm
- lead time
- monte carlo
- dynamical systems
- graphical models
- multi agent
- upper bound
- decision processes