Exploration-Exploitation in MDPs with Options.
Ronan FruitAlessandro LazaricPublished in: CoRR (2017)
Keyphrases
- exploration exploitation
- reinforcement learning
- markov decision processes
- active learning
- bandit problems
- state space
- optimal policy
- markov decision process
- markov decision problems
- factored mdps
- relevance feedback
- reward function
- machine learning
- supervised learning
- eye movements
- learning process
- pairwise
- learning algorithm