Text/Figure Separation in Document Images Using Docstrum Descriptor and Two-Level Clustering.
Valery AnisimovskiyIlya KurilinAndrey ShcherbininPetr PohlPublished in: Visual Information Processing and Communication (2018)
Keyphrases
- document images
- document analysis
- printed documents
- document processing
- ocr systems
- page layout
- printed text
- scanned document images
- word level
- historical documents
- scanned documents
- text lines
- mathematical formulas
- text regions
- document image analysis
- k means
- optical character recognition
- document image understanding
- clustering method
- clustering algorithm
- document layout
- scanned images
- handwritten documents
- line extraction
- document image retrieval
- indian languages
- document clustering
- page segmentation
- word spotting
- language identification
- text documents
- text retrieval
- object recognition
- information retrieval