RL for Latent MDPs: Regret Guarantees and a Lower Bound.
Jeongyeol KwonYonathan EfroniConstantine CaramanisShie MannorPublished in: CoRR (2021)
Keyphrases
- lower bound
- reinforcement learning
- markov decision processes
- upper bound
- optimal policy
- state space
- state and action spaces
- markov decision process
- branch and bound algorithm
- policy iteration
- np hard
- reinforcement learning algorithms
- initial state
- action space
- branch and bound
- worst case
- reward function
- total reward
- function approximation
- objective function
- lower and upper bounds
- partially observable
- optimal solution
- latent variables
- markov decision problems
- average reward
- policy evaluation
- finite state
- dynamic programming
- machine learning
- model free
- infinite horizon
- regret bounds
- optimal control
- finite horizon
- policy search
- factored mdps
- continuous state and action spaces
- online algorithms
- average cost
- learning problems
- learning algorithm
- temporal difference
- decision theoretic planning
- markov chain
- linear programming
- multi agent