Index-based n-gram extraction from large document collections.
Michal KrátkýRadim BacaDavid BednarJirí WalderJiri DvorskýPeter ChovanecPublished in: ICDIM (2011)
Keyphrases
- n gram
- language model
- text classification
- bag of words
- language independent
- language modelling
- viterbi algorithm
- part of speech
- variable length
- language modeling
- inside outside algorithm
- information extraction
- machine learning
- retrieval model
- similarity search
- web documents
- naive bayes
- query expansion
- digital libraries
- databases