Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources.
Xinyan Velocity YuAkari AsaiTrina ChatterjeeJunjie HuEunsol ChoiPublished in: CoRR (2022)
Keyphrases
- synthetic datasets
- benchmark datasets
- training dataset
- high dimensional datasets
- uci datasets
- massive datasets
- image dataset
- pascal voc
- class imbalanced data
- language resources
- limited resources
- artificial and real world datasets
- database
- test data
- neural network
- text classification tasks
- real world
- gene expression datasets
- object detection
- resource allocation
- real life
- high dimensional
- linguistic resources
- machine learning
- photo collections
- million images
- bag of words
- cross language information retrieval