Login / Signup

Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models.

Wentian WangPaul KantorJacob FeldmanLazaros GallosHao Wang
Published in: CoRR (2024)
Keyphrases