Discovering informative content blocks from Web documents.
Shian-Hua LinJan-Ming HoPublished in: KDD (2002)
Keyphrases
- web documents
- information extraction
- semi structured
- web content
- web pages
- web search engines
- content similarity
- textual information
- keywords
- document classification
- web data
- document representation
- prefetching
- link structure
- structured documents
- html documents
- vector space model
- databases
- focused crawling
- wrapper induction