Large Synthetic Data from the arχiv for OCR Post Correction of Historic Scientific Articles.
Jill P. NaimanMorgan G. CosilloPeter K. G. WilliamsAlyssa GoodmanPublished in: TPDL (2023)
Keyphrases
- synthetic data
- scientific articles
- error correction
- augmented reality
- topic modeling
- scientific literature
- optical character recognition
- document images
- data sets
- character recognition
- real world
- real image data
- mri data
- topic models
- synthetic datasets
- data sources
- feature extraction
- multimedia
- artificial intelligence
- databases