Proving Test Set Contamination in Black Box Language Models.
Yonatan OrenNicole MeisterNiladri S. ChatterjiFaisal LadhakTatsunori B. HashimotoPublished in: CoRR (2023)
Keyphrases
- test set
- black box
- language model
- test cases
- language modeling
- error rate
- black boxes
- n gram
- training set
- speech recognition
- white box
- test data
- probabilistic model
- document retrieval
- training data
- retrieval model
- test collection
- statistical language models
- integration testing
- information retrieval
- language modelling
- query terms
- smoothing methods
- query expansion
- white box testing
- relevance model
- pseudo relevance feedback
- machine learning
- software testing
- document ranking
- image database