Dissimilarities Detections in TextsUsing Symbol n-grams and Word Histograms.
Gabriela AndrejkováAbdulwahed AlmarimiPublished in: Open Comput. Sci. (2016)
Keyphrases
- n gram
- language model
- language independent
- character n grams
- variable length
- language modelling
- bag of words
- language modeling
- text classification
- viterbi algorithm
- out of vocabulary
- word segmentation
- web documents
- word level
- information extraction
- part of speech
- language specific
- test collection
- neural network
- natural language
- bayesian networks
- feature selection