Towards Triple-Based Information Extraction from Visually--Structured HTML Pages.
Vojtech SvátekJirí BrázaVilém SklenákPublished in: WWW (Posters) (2003)
Keyphrases
- html pages
- information extraction
- semi structured
- structured data
- data extraction
- website
- html documents
- natural language processing
- text mining
- web documents
- web pages
- web sources
- named entities
- machine learning
- information integration
- semistructured data
- information retrieval
- semi structured data
- xml documents
- text documents
- web mining
- web databases
- data model
- web search
- multiple sources
- web information
- natural language