Detecting Off-Topic Pages in Web Archives.
Yasmin AlNoamanyMichele C. WeigleMichael L. NelsonPublished in: TPDL (2015)
Keyphrases
- web pages
- website
- internet archive
- topic specific
- focused crawling
- focused crawler
- web users
- web documents
- topic distillation
- web information
- web crawling
- web communities
- web applications
- search engine
- home page
- relevant pages
- dynamically generated
- web data
- web content
- dynamic content
- relevant web pages
- web graph
- page content
- semantic web
- web mining
- multimedia
- metadata
- digital libraries
- web crawlers
- pagerank algorithm
- web sources
- link structure
- digital objects
- user sessions
- web browsing
- topic models
- textual content
- user interests
- content features
- anchor text
- web search
- text mining
- access logs
- data extraction
- log files