Web Data Commons - Extracting Structured Data from Two Large Web Corpora.
Hannes MühleisenChristian BizerPublished in: LDOW (2012)
Keyphrases
- web data
- structured data
- semi structured
- web corpora
- semi structured data
- web mining
- information extraction
- query expansion
- query translation
- web usage mining
- semistructured data
- xml documents
- data sources
- web pages
- keyword search
- web databases
- textual data
- metadata
- page contents
- web content
- language model
- search engine