Extracting Event-Centric Document Collections from Large-Scale Web Archives.
Gerhard GossenElena DemidovaThomas RissePublished in: CoRR (2017)
Keyphrases
- document collections
- document archives
- digital libraries
- web scale
- document retrieval
- information retrieval systems
- information retrieval
- text retrieval
- document representation
- document clustering
- automatic document classification
- scatter gather
- web documents
- text collections
- cross language
- relevant documents
- test collection
- information access
- data collections
- web pages
- text data
- metadata
- geographic information retrieval
- web users
- topic detection
- information extraction
- database