A Corpus of Realistic Known-Item Topics with Associated Web Pages in the ClueWeb09.
Matthias HagenDaniel WägnerBenno SteinPublished in: ECIR (2015)
Keyphrases
- web pages
- personal names
- keywords
- text data
- website
- search engine
- newspaper articles
- topic specific
- scientific papers
- topic detection and tracking
- plain text
- web search
- technical papers
- related web pages
- link analysis
- web page classification
- topic tracking
- information retrieval
- topic models
- web server
- text corpora
- web search engines
- topic detection
- web content
- test collection
- document corpus
- real world
- link information
- latent dirichlet allocation
- web users
- word pairs
- news topics
- web documents
- news articles
- web information extraction
- web graph