Combining linguistic and statistical analysis to extract relations from web documents.

Fabian M. Suchanek Georgiana Ifrim Gerhard Weikum

Published in: KDD (2006)

Keyphrases

web documents
statistical analysis
information extraction
web search engines
semi structured
web content
wrapper induction
web pages
document classification
keywords
vector space model
html documents
natural language processing
web directories
topic specific
focused crawling
unstructured documents
web data
document representation
databases
dynamically generated
content similarity
learning algorithm
machine learning