Login / Signup

RL with KL penalties is better viewed as Bayesian inference.

Tomasz KorbakEthan PerezChristopher L. Buckley
Published in: CoRR (2022)
Keyphrases