WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.
Bill Yuchen LinYuntian DengKhyathi Raghavi ChanduFaeze BrahmanAbhilasha RavichanderValentina PyatkinNouha DziriRonan Le BrasYejin ChoiPublished in: CoRR (2024)
Keyphrases
- novice users
- user interface
- end users
- user interaction
- user feedback
- real world
- rapid growth
- information sources
- user groups
- user studies
- daily life
- user satisfaction
- working environment
- amazon mechanical turk
- databases
- database
- internet users
- mechanical turk
- user centric
- human users
- peer to peer
- collaborative filtering
- recommender systems
- data sets