Autonomous Document Cleaning - A Generative Approach to Reconstruct Strongly Corrupted Scanned Texts.
Zhenwen DaiJörg LückePublished in: IEEE Trans. Pattern Anal. Mach. Intell. (2014)
Keyphrases
- document images
- text documents
- scanned documents
- keywords
- information retrieval
- information retrieval systems
- generative model
- web documents
- authorship attribution
- text content
- retrieval systems
- natural language
- electronic documents
- document classification
- document retrieval
- natural language text
- optical character recognition
- scientific papers
- scanned images
- text classification
- robotic systems
- noise free
- document collections
- document clustering
- vector space model
- tf idf
- document representation
- database
- bag of words
- text corpus
- semantic information
- text mining
- cooperative
- machine learning