NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.
Oscar SainzJon Ander CamposIker García-FerreroJulen EtxanizOier Lopez de LacalleEneko AgirrePublished in: CoRR (2023)
Keyphrases
- data sets
- raw data
- image data
- data processing
- database
- training data
- historical data
- spatial data
- statistical analysis
- input data
- data points
- probability distribution
- data structure
- machine learning
- databases
- original data
- complex data
- evaluation measures
- data distribution
- correlation analysis
- attribute values
- synthetic data
- data mining techniques
- natural language processing
- end users
- social media
- prior knowledge
- high quality
- similarity measure