Collecting 16K archived web pages from 17 public web archives.
Mohamed AturbanMichael L. NelsonMichele C. WeigleMartin KleinHerbert Van de SompelPublished in: CoRR (2019)
Keyphrases
- web pages
- website
- web documents
- web content
- web data
- digital libraries
- link analysis
- search engine
- web resources
- web users
- web communities
- web search engines
- data extraction
- personal names
- web mining
- hyperlink structure
- social bookmarking
- dynamic content
- web sources
- web spam
- digital archives
- web information
- google search engine
- link structure
- browsing experience
- page content
- web page classification
- keywords
- web information extraction
- web search
- web server
- adversarial information retrieval
- web spam detection
- dynamically generated
- metadata
- web browser
- web objects
- web crawlers
- web portals
- current web
- web browsing
- digital data
- information retrieval
- home page
- multimedia
- web page content
- textual contents
- classifying web pages
- data collection