Automatic classification of defect page content in scanned document collections.
Reinhold Huber-MörkAlexander SchindlerPublished in: ISPA (2013)
Keyphrases
- document collections
- automatic classification
- page content
- spam detection
- web browsing
- anchor text
- search engine
- information retrieval systems
- web pages
- document representation
- information retrieval
- test collection
- document retrieval
- text retrieval
- relevant documents
- document clustering
- automatic detection
- document images
- digital libraries
- topic detection
- information extraction
- keywords
- query terms
- web search
- content features
- databases
- web users
- scatter gather