ClusTex: Information Extraction from HTML Pages.
Fatima AshrafReda AlhajjPublished in: AINA Workshops (1) (2007)
Keyphrases
- html pages
- information extraction
- semi structured
- data extraction
- website
- html documents
- structured data
- natural language processing
- web documents
- text mining
- information retrieval
- semi structured data
- web pages
- information integration
- machine learning
- named entities
- text documents
- natural language
- data integration
- web mining
- web databases
- web information
- data model
- semistructured data
- domain knowledge