Combining linguistic and statistical analysis to extract relations from web documents.
Fabian M. SuchanekGeorgiana IfrimGerhard WeikumPublished in: KDD (2006)
Keyphrases
- web documents
- statistical analysis
- information extraction
- web search engines
- semi structured
- web content
- wrapper induction
- web pages
- document classification
- keywords
- vector space model
- html documents
- natural language processing
- web directories
- topic specific
- focused crawling
- unstructured documents
- web data
- document representation
- databases
- dynamically generated
- content similarity
- learning algorithm
- machine learning