Unweaving a web of documents.
Ramanathan V. GuhaRavi KumarD. SivakumarRavi SundaramPublished in: KDD (2005)
Keyphrases
- web documents
- web data
- multilingual documents
- web information
- website
- web pages
- web applications
- document classification
- open directory project
- document repositories
- content similarity
- digital documents
- structured information
- information retrieval
- vector space model
- xml documents
- newspaper articles
- information retrieval systems
- database
- relevant documents
- web mining
- linked data
- document collections
- electronic documents
- focused crawling
- topic specific
- user interests
- web users
- relevant content
- information sources
- semantic web
- page layout
- web content
- textual features
- web crawler
- web queries
- web environment
- document clustering
- text information
- text documents
- document retrieval
- information extraction
- end users
- answering questions
- focused crawler
- digital libraries
- current web search engines
- multimedia documents