OCR-IDL: OCR Annotations for Industry Document Library Dataset.
Ali Furkan BitenRubèn TitoLluís GómezErnest ValvenyDimosthenis KaratzasPublished in: CoRR (2022)
Keyphrases
- document images
- optical character recognition
- document processing
- printed documents
- scanned documents
- document analysis
- character recognition
- post processing
- text recognition
- document image retrieval
- error correction
- text lines
- document image analysis
- page segmentation
- preprocessing
- character segmentation
- recognition errors
- web documents
- handwriting recognition
- database
- semantic annotation
- page layout
- document clustering
- keywords
- information retrieval
- scanned images
- digital libraries
- historical documents
- benchmark datasets
- document classification