Selecting relevant text subsets from web-data for building topic specific language models.
Abhinav SethyPanayiotis G. GeorgiouShrikanth S. NarayananPublished in: HLT-NAACL (2006)
Keyphrases
- language model
- web data
- topic specific
- web documents
- web queries
- web crawling
- information retrieval
- web pages
- language modeling
- web mining
- n gram
- semi structured
- document retrieval
- probabilistic model
- vector space model
- retrieval model
- query expansion
- text retrieval
- web content
- test collection
- query terms
- keywords
- text mining
- query logs
- search engine
- topic modeling
- word pairs
- information extraction
- web search engines
- deep web
- web sources
- text classification
- anchor text
- topic models
- semantic information
- document representation
- bayesian networks
- text corpora
- website