Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources.
Xinyan YuTrina ChatterjeeAkari AsaiJunjie HuEunsol ChoiPublished in: EMNLP (Findings) (2022)
Keyphrases
- benchmark datasets
- synthetic datasets
- high dimensional datasets
- training dataset
- massive datasets
- uci datasets
- pascal voc
- artificial and real world datasets
- digital libraries
- resource allocation
- resource management
- class imbalanced data
- language resources
- uci machine learning repository
- construction process
- data sets
- cross lingual
- database
- image dataset
- object detection
- limited resources
- outlier detection
- resource constraints
- computing resources
- knn
- real life
- high dimensional
- photo collections
- feature set
- training data
- decision trees
- million images
- neural network
- test data