Login / Signup

Investigating Data Contamination in Modern Benchmarks for Large Language Models.

Chunyuan DengYilun ZhaoXiangru TangMark GersteinArman Cohan
Published in: NAACL-HLT (2024)
Keyphrases
  • language model
  • training data
  • probabilistic model
  • language modeling
  • feature selection
  • speech recognition
  • document retrieval
  • machine learning
  • information retrieval
  • language modelling