Web Text Data Mining for Building Large Scale Language Modelling Corpus.
Jan SvecJan HoidekrDaniel SoutnerJan VavruskaPublished in: TSD (2011)
Keyphrases
- language modelling
- data mining
- text mining
- web documents
- language model
- web mining
- text data
- n gram
- information retrieval
- multiword
- text documents
- ad hoc retrieval
- machine learning
- web data
- pseudo relevance feedback
- web pages
- semantic information
- text retrieval
- web users
- natural language processing
- link analysis
- tf idf
- query expansion
- weighting scheme
- keywords