Information Extraction from Template-Generated Hidden Web Documents.
Yih-Ling HedleyMuhammad YounasAnne E. JamesMark SandersonPublished in: ICWI (2004)
Keyphrases
- web documents
- information extraction
- semi structured
- document classification
- wrapper induction
- web search engines
- information retrieval
- machine learning
- dynamically generated
- named entities
- web pages
- natural language processing
- named entity recognition
- html documents
- natural language
- web content
- unstructured documents
- text mining
- keywords
- website
- unstructured text
- web mining
- structured data
- textual data
- relation extraction
- question answering
- focused crawling
- text documents
- vector space model
- search engine
- text categorization