RL for Latent MDPs: Regret Guarantees and a Lower Bound.
Jeongyeol KwonYonathan EfroniConstantine CaramanisShie MannorPublished in: NeurIPS (2021)
Keyphrases
- lower bound
- markov decision processes
- reinforcement learning
- upper bound
- optimal policy
- state space
- markov decision process
- state and action spaces
- branch and bound algorithm
- branch and bound
- reward function
- action space
- initial state
- reinforcement learning algorithms
- policy iteration
- latent variables
- optimal solution
- policy evaluation
- finite state
- markov decision problems
- objective function
- worst case
- np hard
- decision theoretic planning
- function approximation
- factored mdps
- partially observable
- total reward
- quality guarantees
- continuous state and action spaces
- model free
- lower and upper bounds
- finite horizon
- average reward
- infinite horizon
- learning algorithm
- multi agent
- continuous state spaces
- regret bounds
- stochastic games
- learning problems
- approximate dynamic programming
- policy search
- online learning
- least squares
- dynamic programming
- search algorithm
- discounted reward
- semi markov decision processes
- online algorithms
- machine learning