Word Length n-Grams for Text Re-use Detection.
Alberto Barrón-CedeñoChiara BasileMirko Degli EspostiPaolo RossoPublished in: CICLing (2010)
Keyphrases
- n gram
- language model
- character n grams
- text classification
- language independent
- information retrieval
- word level
- bag of words
- web documents
- variable length
- text retrieval
- language modelling
- text mining
- viterbi algorithm
- language modeling
- artificial intelligence
- text documents
- cross language
- semantic information
- part of speech
- relevance ranking
- language specific
- document analysis
- word segmentation
- text categorization
- information retrieval systems
- probabilistic model
- keywords
- inside outside algorithm