Automated Ground Truth Data Generation for Newspaper Document Images.
Thomas StreckerJoost van BeusekomSahin AlbayrakThomas M. BreuelPublished in: ICDAR (2009)
Keyphrases
- document images
- data generation
- ground truth
- document image analysis
- document analysis
- document image understanding
- active learning
- optical character recognition
- data streams
- document processing
- page segmentation
- document image retrieval
- high throughput
- scanned documents
- historical documents
- streaming data
- mathematical formulas
- machine learning
- high dimensional data
- color images
- line extraction
- page layout