From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.
Daniil TiapkinDenis BelomestnyEric MoulinesAlexey NaumovSergey SamsonovYunhao TangMichal ValkoPierre MénardPublished in: ICML (2022)
Keyphrases
- reinforcement learning
- autonomous learning
- exploration strategy
- action selection
- exploration exploitation
- exploration exploitation tradeoff
- boundary conditions
- information visualization
- model free
- reinforcement learning algorithms
- mixture model
- multi agent
- function approximation
- real time
- distributed databases
- temporal difference
- markov decision processes
- unknown environments
- language model
- relevance feedback
- bandit problems
- database systems