Strigil: A Framework for Data Extraction in Semi-Structured Web Documents.
Jakub StárkaIrena HolubováMartin NecaskýPublished in: iiWAS (2013)
Keyphrases
- semi structured
- web documents
- data extraction
- structured data
- information extraction
- information integration
- web data extraction
- web data
- web pages
- semi structured data
- web sources
- tree structured patterns
- wrapper generation
- semistructured data
- data model
- html pages
- keywords
- xml databases
- text mining
- semistructured documents
- web content
- social network analysis
- html documents
- web search engines
- data integration
- document collections
- data sources
- database