As we may perceive: finding the boundaries of compound documents on the web.
Pavel DmitrievPublished in: WWW (2008)
Keyphrases
- web documents
- web data
- multilingual documents
- web information
- website
- information retrieval
- document collections
- web pages
- web applications
- newspaper articles
- semantic web
- relevant documents
- content similarity
- web crawler
- information retrieval systems
- web mining
- current web search engines
- web queries
- document repositories
- structured information
- web users
- open directory project
- web content
- metadata
- linked data
- document clustering
- digital documents
- web search
- desired information
- web environment
- text information
- semi structured
- multimedia documents
- textual data
- xml documents
- textual features
- information extraction
- page layout
- focused crawling
- link analysis
- web resources
- information sources
- web search engines
- html pages
- answering questions
- document classification
- user interests
- search engine
- database