Identifying Documents In-Scope of a Collection from Web Archives.
Krutarth PatelCornelia CarageaMark E. PhillipsNathaniel T. FoxPublished in: JCDL (2020)
Keyphrases
- document repositories
- document collections
- web documents
- digital libraries
- digital collections
- web data
- multilingual documents
- metadata
- web information
- document archives
- cultural heritage
- meta information
- text collections
- web content
- time stamped
- document set
- content similarity
- web pages
- text information
- distributed information retrieval
- automatic categorization
- digital documents
- web applications
- information retrieval systems
- website
- open directory project
- news reports
- structured information
- database
- multimedia documents
- multimedia
- xml documents
- document clustering
- information retrieval
- internet archive
- text documents
- web mining
- newspaper articles
- digital archives
- textual data
- web users
- information extraction
- document retrieval
- related documents
- music collections
- relevant documents
- test collection
- cross references
- topic specific
- search interface