Model-free Posterior Sampling via Learning Rate Randomization.
Daniil TiapkinDenis BelomestnyDaniele CalandrielloEric MoulinesRémi MunosAlexey NaumovPierre PerraultMichal ValkoPierre MénardPublished in: NeurIPS (2023)
Keyphrases
- learning rate
- model free
- reinforcement learning
- learning algorithm
- convergence rate
- markov chain monte carlo
- reinforcement learning algorithms
- rapid convergence
- function approximation
- policy iteration
- adaptive learning rate
- convergence speed
- posterior distribution
- weight vector
- multilayer neural networks
- monte carlo
- convergence theorem
- gaussian process
- average reward
- delta bar delta
- optimal policy
- objective function