Login / Signup

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge.

Tianhao WuWeizhe YuanOlga GolovnevaJing XuYuandong TianJiantao JiaoJason WestonSainbayar Sukhbaatar
Published in: CoRR (2024)
Keyphrases