Detecting data records in semi-structured web sites based on text token clustering.
Xiaoying GaoLe Phong Bao VuongMengjie ZhangPublished in: Integr. Comput. Aided Eng. (2008)
Keyphrases
- semi structured
- data records
- data extraction
- html pages
- structured data
- free text
- website
- web documents
- web pages
- text mining
- web data sources
- information extraction
- data model
- wrapper induction
- wrapper generation
- web data
- web data extraction
- information integration
- xml databases
- web content
- data sets
- query result
- web search engines
- search engine
- information retrieval
- database
- web server
- html documents
- keywords