Chinese text categorization using the character N-gram.
Makoto SuzukiNaohide YamagishiYi-Ching TsaiPublished in: ISITA (2012)
Keyphrases
- text categorization
- character n grams
- cross language
- text classification
- n gram
- k nearest neighbor
- word segmentation
- feature selection
- variable length
- unlabeled data
- knn
- term frequency
- cross language information retrieval
- semi supervised learning
- text documents
- automatic text categorization
- optical character recognition
- tf idf
- labeled data
- feature space
- bayesian networks
- metadata
- information retrieval
- machine learning