From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.
Daniil TiapkinDenis BelomestnyEric MoulinesAlexey NaumovSergey SamsonovYunhao TangMichal ValkoPierre MénardPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- exploration exploitation
- exploration strategy
- autonomous learning
- action selection
- boundary conditions
- active learning
- exploration exploitation tradeoff
- bandit problems
- state space
- model free
- function approximation
- markov decision processes
- unknown environments
- real time
- database
- sufficient conditions
- case study
- genetic algorithm