Towards Crawling the Web for Structured Data: Pitfalls of Common Crawl for E-Commerce.
Alex StolzMartin HeppPublished in: COLD (2015)
Keyphrases
- structured data
- web crawlers
- web pages
- semi structured
- web crawling
- semi structured data
- structured information
- focused crawling
- linked data
- web crawler
- textual data
- web sources
- search engine
- deep web
- unstructured data
- web mining
- website
- information extraction
- web data
- data sources
- web documents
- xml documents
- metadata
- unstructured text
- web search
- topic specific
- web server
- web databases
- structured and unstructured data
- link analysis
- web content
- keyword queries
- unstructured information
- database
- web users
- end users
- information retrieval
- web logs
- web graph
- semantic web
- natural language processing
- keywords
- decision trees
- databases
- structured databases