An Open Source Data Contamination Report for Llama Series Models.
Yucheng LiPublished in: CoRR (2023)
Keyphrases
- open source
- data analysis
- database
- historical data
- prior knowledge
- small number
- data sets
- raw data
- experimental data
- accurate models
- missing data
- synthetic data
- data processing
- training data
- data mining techniques
- knowledge discovery
- data structure
- statistical methods
- neural network
- incomplete data
- high dimensional data
- data collection
- image data
- probability distribution
- statistical analysis
- original data
- data quality
- learning models
- web services
- data mining tools