HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research.
Jinsuk KimHo-Seop ChoeBeom-Jong YouJeong-Hyun SeoSuk-Hoon LeeDong-Yul RaPublished in: J. Comput. Sci. Eng. (2009)
Keyphrases
- text categorization
- text collections
- text classification
- feature selection
- knn
- information gain
- text documents
- multi label
- automated text categorization
- reuters corpus
- k nearest neighbor
- information retrieval
- semi supervised learning
- naive bayes
- document categorization
- term frequency
- metadata
- feature selection for text categorization
- feature selections
- data sets
- multi instance multi label learning
- text classifiers
- term weighting
- feature weighting
- term selection
- automatic text categorization
- learning algorithm
- association rules
- labeled data
- semantic browsing
- document collections