RL with KL penalties is better viewed as Bayesian inference.
Tomasz KorbakEthan PerezChristopher L. BuckleyPublished in: EMNLP (Findings) (2022)
Keyphrases
- bayesian inference
- reinforcement learning
- probabilistic model
- hyperparameters
- prior information
- bayesian model
- particle filter
- hierarchical bayesian
- variational inference
- variational bayes
- weighted model counting
- gibbs sampler
- kullback leibler
- statistical inference
- markov chain monte carlo
- optimal policy
- expectation propagation
- kl divergence
- decision trees