Relative N-gram signatures: Document visualization at the level of character N-grams.
Magdalena JankowskaVlado KeseljEvangelos E. MiliosPublished in: IEEE VAST (2012)
Keyphrases
- n gram
- character n grams
- word level
- language model
- web documents
- variable length
- language independent
- bag of words
- language modeling
- text classification
- part of speech
- document retrieval
- cross language
- document level
- document representation
- information retrieval
- word segmentation
- cross language information retrieval
- retrieval systems
- information retrieval systems
- text retrieval
- text documents
- document clustering
- knn
- language specific
- neural network