Extracting Informative Sections of Web Documents Based on Scoring DOM Subtrees.
Yong-Hyuk KimDong-ug KimSejun AhnPublished in: International Conference on Internet Computing (2008)
Keyphrases
- web documents
- web pages
- html documents
- semi structured
- document classification
- information extraction
- website
- tree structure
- data extraction
- web search engines
- keywords
- web content
- textual information
- vector space model
- dynamically generated
- focused crawling
- automatic extraction
- web data
- structured data
- structured documents
- text mining
- web directories
- search engine
- information retrieval