Ducky: a data extraction system for various structured web documents.
Kei KanaokaYotaro FujiiMotomichi ToyamaPublished in: IDEAS (2014)
Keyphrases
- data extraction
- web documents
- semi structured
- web pages
- tree structured patterns
- information extraction
- web data extraction
- structured data
- web sources
- wrapper generation
- web search engines
- web content
- keywords
- data integration
- semistructured data
- web data
- information integration
- semi structured data
- html documents
- website
- real world
- document representation
- xml files
- natural language
- metadata
- search engine
- similarity measure
- business intelligence
- web search
- natural language processing