Identifying Documents In-Scope of a Collection from Web Archives.
Krutarth PatelCornelia CarageaMark PhillipsNathaniel T. FoxPublished in: CoRR (2020)
Keyphrases
- document repositories
- document collections
- web documents
- digital libraries
- web data
- metadata
- digital collections
- multilingual documents
- meta information
- information retrieval
- document archives
- website
- text collections
- web information
- digital documents
- information retrieval systems
- content similarity
- distributed information retrieval
- web applications
- textual data
- document set
- automatic categorization
- cultural heritage
- newspaper articles
- document representation
- database
- multimedia
- time stamped
- web pages
- text information
- adversarial information retrieval
- text categorization
- text documents
- relevant documents
- topic specific
- keywords
- news reports
- current web search engines
- cross references
- music collections
- web crawler
- effective retrieval
- multimedia documents
- relevant content
- answering questions
- user queries
- linked data
- test collection
- historical manuscripts
- vector space model
- user generated content
- semantic web
- google scholar
- language model
- open directory project
- structured information