Exploration-Exploitation in MDPs with Options.

Ronan Fruit Alessandro Lazaric

Published in: AISTATS (2017)

Keyphrases

exploration exploitation
reinforcement learning
markov decision processes
active learning
bandit problems
state space
dynamic programming
relevance feedback
markov decision problems
machine learning
supervised learning
optimal policy
markov decision process
learning process
reward function
viewpoint
learning algorithm
higher level
high level