SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects.
David Ifeoluwa AdelaniHannah LiuXiaoyu ShenNikita VassilyevJesujoba O. AlabiYanke MaoHaonan GaoEn-Shiun Annie LeePublished in: CoRR (2023)
Keyphrases
- benchmark datasets
- support vector machine
- classification scheme
- document classification
- classification accuracy
- decision trees
- text classification
- image classification
- evaluation method
- database
- feature space
- pattern recognition
- training dataset
- language independent
- pattern classification
- uci datasets
- classification rules
- topic models
- feature extraction
- classification algorithm
- classification method
- expressive power
- support vector machine svm
- classification models
- supervised learning
- feature vectors
- automatic classification
- high dimensional
- support vector
- classification systems
- machine learning