OCR-IDL: OCR Annotations for Industry Document Library Dataset.
Ali Furkan BitenRubèn TitoLluís GómezErnest ValvenyDimosthenis KaratzasPublished in: ECCV Workshops (4) (2022)
Keyphrases
- document images
- optical character recognition
- document processing
- printed documents
- scanned documents
- post processing
- document analysis
- document image retrieval
- character recognition
- error correction
- preprocessing
- recognition errors
- text recognition
- page segmentation
- document image analysis
- digital libraries
- information retrieval
- printed text
- page layout
- keywords
- database
- web documents
- scanned images
- character segmentation
- textual documents
- text lines
- co occurrence
- synthetic datasets
- document classification
- semantic annotation
- end to end
- document collections