Document clustering using character N-grams: a comparative evaluation with term-based and word-based clustering.
Yingbo MiaoVlado KeseljEvangelos E. MiliosPublished in: CIKM (2005)
Keyphrases
- document clustering
- comparative evaluation
- character n grams
- document representation
- n gram
- clustering algorithm
- clustering method
- variable length
- document collections
- cross language
- k means
- text mining
- term frequency
- text documents
- tf idf
- document clusters
- cross lingual
- cross language information retrieval
- vector space model
- cluster analysis
- language specific
- text categorization
- text classification
- bag of words
- keywords
- data mining
- optical character recognition
- term weighting
- co occurrence
- information retrieval
- character recognition
- web documents