Extracting Event-Centric Document Collections from Large-Scale Web Archives.
Gerhard GossenElena DemidovaThomas RissePublished in: TPDL (2017)
Keyphrases
- document collections
- document archives
- digital libraries
- document retrieval
- information retrieval systems
- web scale
- information retrieval
- test collection
- text retrieval
- scatter gather
- web pages
- automatic document classification
- topic detection
- web documents
- relevant documents
- multimedia
- information access
- internet archive
- document representation
- cross language
- text data
- web users
- text corpora
- xml retrieval
- data collections
- geographic information retrieval
- database
- databases