Exploration-Exploitation in MDPs with Options.
Ronan FruitAlessandro LazaricPublished in: AISTATS (2017)
Keyphrases
- exploration exploitation
- reinforcement learning
- markov decision processes
- active learning
- bandit problems
- state space
- dynamic programming
- relevance feedback
- markov decision problems
- machine learning
- supervised learning
- optimal policy
- markov decision process
- learning process
- reward function
- viewpoint
- learning algorithm
- higher level
- high level