On Optimistic versus Randomized Exploration in Reinforcement Learning.
Ian OsbandBenjamin Van RoyPublished in: CoRR (2017)
Keyphrases
- reinforcement learning
- active exploration
- exploration strategy
- action selection
- model based reinforcement learning
- exploration exploitation
- function approximation
- autonomous learning
- state space
- active learning
- reinforcement learning algorithms
- markov decision processes
- supervised learning
- neural network
- exploration exploitation tradeoff
- optimal policy
- optimal control
- learning algorithm
- machine learning
- privacy preserving association rule mining
- balancing exploration and exploitation
- model free
- temporal difference
- control policy
- decision forest
- learning process
- randomized algorithms
- multi agent reinforcement learning
- lower bound
- decision trees
- real time