English and Taiwanese text categorization using N-gram based on Vector Space Model.
Makoto SuzukiNaohide YamagishiYi-Ching TsaiTakashi IshidaMasayuki GotoPublished in: ISITA (2010)
Keyphrases
- text categorization
- vector space model
- cross language
- tf idf
- information retrieval
- document categorization
- average precision
- text classification
- retrieval model
- text documents
- document representation
- language model
- n gram
- semantic similarity
- document clustering
- vector space
- feature selection
- knn
- term frequency
- latent semantic indexing
- web documents
- text representation
- semantic information
- semi supervised learning
- term weighting
- k nearest neighbor
- test collection
- document collections
- unlabeled data
- natural language
- document retrieval
- image classification
- text mining
- information extraction