WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.

Bill Yuchen Lin Yuntian Deng Khyathi Raghavi Chandu Faeze Brahman Abhilasha Ravichander Valentina Pyatkin Nouha Dziri Ronan Le Bras Yejin Choi

Published in: CoRR (2024)