Lucene for n-grams using the CLUEWeb Collection.
Gregory B. NewbyChristopher T. FallenKylie McCormickPublished in: TREC (2009)
Keyphrases
- n gram
- language model
- document retrieval
- language independent
- bag of words
- query expansion
- language modelling
- language modeling
- text classification
- variable length
- document collections
- part of speech
- word segmentation
- open source
- test collection
- search engine
- viterbi algorithm
- inside outside algorithm
- retrieval model
- web documents
- association rules
- natural language
- information retrieval
- data analysis
- character n grams
- machine learning