Automatic classification of defect page content in scanned document collections.

Reinhold Huber-Mörk Alexander Schindler

Published in: ISPA (2013)

Keyphrases

document collections
automatic classification
page content
spam detection
web browsing
anchor text
search engine
information retrieval systems
web pages
document representation
information retrieval
test collection
document retrieval
text retrieval
relevant documents
document clustering
automatic detection
document images
digital libraries
topic detection
information extraction
keywords
query terms
web search
content features
databases
web users
email
scatter gather