Hidden Schema Extraction in Web Documents.
Vincenza CarchioloAlessandro LongheuMichele MalgeriPublished in: DNIS (2003)
Keyphrases
- web documents
- information extraction
- semi structured
- web pages
- document classification
- web search engines
- semistructured data
- textual information
- data model
- databases
- focused crawling
- web data
- web content
- natural language processing
- keywords
- automatic extraction
- data extraction
- document representation
- web information extraction
- extraction rules
- html documents
- unstructured documents
- database schema
- database
- topic specific
- structured documents
- wrapper induction
- content similarity
- machine learning
- xml data
- link structure
- database systems
- vector space model
- xml schema