RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation.
Jeongyeol KwonShie MannorConstantine CaramanisYonathan EfroniPublished in: CoRR (2024)
Keyphrases
- policy evaluation
- reinforcement learning
- markov decision processes
- least squares
- temporal difference
- policy iteration
- monte carlo
- model free
- function approximation
- variance reduction
- td learning
- optimal policy
- reinforcement learning algorithms
- semi parametric
- state space
- learning algorithm
- markov decision problems
- importance sampling
- fixed point
- finite state
- statistical inference
- markov chain
- search space
- dynamic programming
- supervised learning
- latent variables
- partially observable markov decision processes
- np hard
- active learning
- dynamical systems