TEXT: Automatic Template Extraction from Heterogeneous Web Pages.
Chulyun KimKyuseok ShimPublished in: IEEE Trans. Knowl. Data Eng. (2011)
Keyphrases
- web pages
- web documents
- web information extraction
- keywords
- data extraction
- automatically extracted
- text extraction
- text content
- website
- automatic text
- textual content
- content features
- template matching
- search engine
- fully automatic
- text mining
- anchor text
- information extraction
- web data extraction
- plain text
- html pages
- semi automatically
- automatic extraction
- text retrieval
- text data
- free text
- matching algorithm
- semi automatic
- textual contents
- textual data
- technical papers
- web content mining
- textual features
- automatically extracting
- link analysis
- web users
- text documents
- web search engines
- information retrieval
- text information
- database
- content extraction
- news pages
- web search