To the Cutoff... and Beyond? A Longitudinal Perspective on LLM Data Contamination.
Manley RobertsHimanshu ThakurChristine HerlihyColin WhiteSamuel DooleyPublished in: ICLR (2024)
Keyphrases
- data sets
- image data
- data collection
- raw data
- data structure
- high quality
- data quality
- neural network
- synthetic data
- knowledge discovery
- knowledge base
- probability distribution
- training data
- noisy data
- original data
- application domains
- big data
- data objects
- xml documents
- experimental data
- spatial data
- attribute values
- background knowledge
- data sources
- computer systems
- database
- data mining techniques
- medical images