Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies.
Yonathan EfroniNadav MerlisMohammad GhavamzadehShie MannorPublished in: CoRR (2019)
Keyphrases
- model based reinforcement learning
- regret bounds
- lower bound
- markov decision processes
- optimal policy
- upper bound
- markov decision problems
- dynamic programming
- partially observable markov decision processes
- reinforcement learning
- online learning
- worst case
- reward function
- markov decision process
- infinite horizon
- mixture model
- linear regression
- average cost
- search space
- optimal solution