TTC-3600: A new benchmark dataset for Turkish text categorization.
Deniz KilinçAkin ÖzçiftFatma BozyigitPelin YildirimFatih YücalarEmin BorandagPublished in: J. Inf. Sci. (2017)
Keyphrases
- text categorization
- benchmark datasets
- text classification
- knn
- multi label
- feature selection
- information gain
- ensemble methods
- k nearest neighbor
- feature weighting
- reuters corpus
- document categorization
- automated text categorization
- tf idf
- text collections
- term frequency
- semi supervised learning
- automatic text categorization
- semantic browsing
- naive bayes
- feature extraction
- machine learning
- text documents
- multi instance multi label learning
- feature selections
- unlabeled data