PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents.
Nan ZhangConnor T. HeatonSean Timothy OkonskyPrasenjit MitraHilal Ezgi ToramanPublished in: LREC/COLING (2024)
Keyphrases
- optical character recognition
- scientific documents
- character recognition
- text recognition
- document images
- ocr systems
- digital libraries
- character segmentation
- image binarization
- handwriting recognition
- printed documents
- page segmentation
- printed text
- scanned documents
- computer vision
- video sequences
- historical manuscripts
- image processing