Exploration-Exploitation in Constrained MDPs.
Yonathan EfroniShie MannorMatteo PirottaPublished in: CoRR (2020)
Keyphrases
- exploration exploitation
- reinforcement learning
- markov decision processes
- active learning
- bandit problems
- state space
- machine learning
- factored mdps
- markov decision process
- dynamic programming
- search space
- markov decision problems
- special case
- multi objective
- optimal policy
- lower bound
- feature space
- decision trees
- data mining