Can We Find Documents in Web Archives without Knowing their Contents?
Khoi Duy VoTuan TranTu Ngoc NguyenXiaofei ZhuWolfgang NejdlPublished in: CoRR (2017)
Keyphrases
- web documents
- metadata
- web information
- web data
- content similarity
- web content
- multilingual documents
- website
- text content
- digital libraries
- digital documents
- document collections
- document repositories
- web pages
- database
- text information
- multimedia
- web mining
- information retrieval
- web data mining
- html pages
- textual contents
- web applications
- page contents
- open directory project
- multimedia documents
- information retrieval systems
- search interface
- document archives
- structured information
- semantic web
- relevant documents
- search engine
- electronic documents
- historical manuscripts
- document retrieval
- web crawler
- keywords
- web search engines
- word spotting
- textual data
- newspaper articles
- helping users
- semi structured
- document clustering
- textual content
- web users
- user interests
- text documents
- extensible markup language
- digital archives
- text mining
- current web search engines