NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.
Oscar SainzJon Ander CamposIker García-FerreroJulen EtxanizOier Lopez de LacalleEneko AgirrePublished in: EMNLP (Findings) (2023)
Keyphrases
- data sets
- database
- synthetic data
- data collection
- raw data
- high quality
- data structure
- statistical analysis
- data sources
- historical data
- original data
- data points
- real world
- prior knowledge
- complex data
- knowledge base
- data acquisition
- data distribution
- experimental data
- data analysis
- data processing
- input data
- small number
- probability distribution
- computer systems
- high dimensional
- database systems
- missing data
- data mining techniques
- natural language processing
- neural network