Can we find documents in web archives without knowing their contents?
Khoi Duy VoTuan TranTu Ngoc NguyenXiaofei ZhuWolfgang NejdlPublished in: WebSci (2016)
Keyphrases
- metadata
- web documents
- web information
- web data
- content similarity
- web content
- website
- text content
- digital documents
- multilingual documents
- information retrieval
- web pages
- multimedia
- digital libraries
- document structure
- web applications
- html pages
- xml documents
- page contents
- textual content
- web mining
- document archives
- structured information
- document repositories
- search interface
- text information
- search engine
- multimedia documents
- database
- electronic documents
- tag clouds
- document retrieval
- document collections
- open directory project
- helping users
- web crawler
- web data mining
- web users
- keywords
- information retrieval systems
- relevant documents
- cultural heritage
- textual data
- user interests
- topic specific
- text documents
- web queries
- digital archives
- newspaper articles
- search tasks
- relevant content
- semi structured
- test collection
- query expansion
- google scholar
- current web search engines