Login / Signup
Investigating Data Contamination for Pre-training Language Models.
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
Published in:
CoRR (2024)
Keyphrases
</>
language model
language modeling
information retrieval
knowledge discovery
training data
context sensitive