SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects.
David Ifeoluwa AdelaniHannah LiuXiaoyu ShenNikita VassilyevJesujoba O. AlabiYanke MaoHaonan GaoEn-Shiun Annie LeePublished in: EACL (1) (2024)
Keyphrases
- benchmark datasets
- feature set
- pattern recognition
- image classification
- feature extraction
- automatic classification
- support vector machine
- feature vectors
- machine learning algorithms
- classification method
- decision trees
- expressive power
- feature selection
- classification models
- classification scheme
- classification algorithm
- decision rules
- training dataset
- classification systems
- text classification
- supervised learning
- classification accuracy
- support vector
- class labels
- support vector machine svm
- feature space
- document classification
- preprocessing
- web pages
- data sets
- uci datasets