Detecting off-topic pages within TimeMaps in Web archives.
Yasmin AlNoamanyMichele C. WeigleMichael L. NelsonPublished in: Int. J. Digit. Libr. (2016)
Keyphrases
- web pages
- website
- internet archive
- topic specific
- web users
- focused crawling
- focused crawler
- web documents
- topic distillation
- relevant pages
- web information
- search engine
- web applications
- dynamic content
- web graph
- semantic web
- dynamically generated
- web crawling
- link information
- digital libraries
- web mining
- web content
- page content
- link analysis
- anchor text
- web server
- web queries
- user interests
- relevant web pages
- web crawlers
- textual content
- page layout
- html pages
- home page
- digital archives
- topic models
- web communities
- data extraction
- link structure
- topic modeling
- log files
- cultural heritage