Lost but not forgotten: finding pages on the unarchived web.
Hugo C. HuurdemanJaap KampsThaer SamarArjen P. de VriesAnat Ben-DavidRichard A. RogersPublished in: Int. J. Digit. Libr. (2015)
Keyphrases
- website
- web pages
- web users
- web documents
- web information
- web applications
- web graph
- dynamically generated
- dynamic content
- home page
- web content
- search engine
- web objects
- web crawling
- web resources
- information sources
- web data
- web communities
- page contents
- link structure
- user generated content
- link analysis
- semantic web
- web technologies
- focused crawling
- search queries
- ranking algorithm
- web mining
- web crawlers
- content similarity
- focused crawler
- page content
- user interaction
- page layout