Login / Signup
Tailoring Self-Rationalizers with Multi-Reward Distillation.
Sahana Ramnath
Brihi Joshi
Skyler Hallinan
Ximing Lu
Liunian Harold Li
Aaron Chan
Jack Hessel
Yejin Choi
Xiang Ren
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
cooperative
machine learning
long run
data mining
bayesian networks
expert systems
least squares
markov chain
average reward