A Novel Method for Extracting Information from Web Pages with Multiple Presentation Templates.
Qingzhong LiYanhui DingAn FengYongquan DongPublished in: J. Softw. (2010)
Keyphrases
- prior information
- web content
- web pages
- detection method
- web documents
- high precision
- pairwise
- computational cost
- information retrieval
- information overload
- segmentation method
- high accuracy
- support vector machine
- preprocessing
- data sets
- dynamic programming
- cost function
- computational complexity
- similarity measure
- search engine
- genetic algorithm
- information loss
- statistical information