N-gram Counts and Language Models from the Common Crawl.
Christian BuckKenneth HeafieldBas van OoyenPublished in: LREC (2014)
Keyphrases
- language model
- n gram
- language modeling
- web search
- probabilistic model
- language modelling
- language independent
- speech recognition
- document retrieval
- information retrieval
- search engine
- retrieval model
- query terms
- bag of words
- context sensitive
- query expansion
- word segmentation
- test collection
- part of speech
- web pages
- pseudo relevance feedback
- vector space model
- document representation
- natural language processing
- word level
- smoothing methods