Model-free Posterior Sampling via Learning Rate Randomization.
Daniil TiapkinDenis BelomestnyDaniele CalandrielloEric MoulinesRémi MunosAlexey NaumovPierre PerraultMichal ValkoPierre MénardPublished in: CoRR (2023)
Keyphrases
- model free
- learning rate
- reinforcement learning
- learning algorithm
- markov chain monte carlo
- reinforcement learning algorithms
- convergence rate
- temporal difference
- weight vector
- function approximation
- policy iteration
- rapid convergence
- convergence speed
- multilayer neural networks
- adaptive learning rate
- monte carlo
- convergence theorem
- sample size
- simulated annealing
- differential evolution
- posterior distribution
- average reward