Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference.
Dat Thanh NguyenPublished in: CoRR (2023)
Keyphrases
- data sets
- statistical analysis
- database
- raw data
- training data
- test data
- data processing
- statistical tests
- high quality
- complex data
- data quality
- data analysis
- data collection
- computer systems
- natural language
- synthetic data
- data sources
- data mining algorithms
- sensor data
- input data
- high dimensional
- benchmark datasets
- natural language processing
- image data
- statistical models
- data mining
- data points
- databases
- statistical inference