Heuristic learning of rules for information extraction from web documents.
Dawei HuHuan LiTianyong HaoEnhong ChenLiu WenyinPublished in: Infoscale (2007)
Keyphrases
- web documents
- information extraction
- relational learning
- semi structured
- learning algorithm
- keywords
- learning process
- structured documents
- named entities
- unstructured documents
- machine learning
- wrapper induction
- unstructured text
- html documents
- textual data
- background knowledge
- web search engines
- natural language processing
- supervised learning
- active learning
- training set
- natural language
- information retrieval