Theme Extraction from Chinese Web Documents Based on Page Segmentation and Entropy.
Deqing WangHui ZhangGang ZhouPublished in: ISMIS (2009)
Keyphrases
- web documents
- page segmentation
- web pages
- information extraction
- comparative evaluation
- document images
- web information extraction
- storage and retrieval
- web search engines
- semi structured
- keywords
- website
- html documents
- search engine
- web content
- evaluation methods
- error analysis
- optical character recognition
- automatic extraction
- wrapper induction
- structured data
- focused crawling
- knowledge discovery
- image processing
- structured documents
- text mining
- semantic information