OCR Cleaning of Scientific Texts with LLMs.
Gábor MadarászNoémi Ligeti-NagyAndras HollTamás VáradiPublished in: NSLP (2024)
Keyphrases
- optical character recognition
- post processing
- scientific papers
- document images
- preprocessing
- scientific data
- character recognition
- recognition errors
- text recognition
- scanned documents
- printed documents
- scientific discovery
- error correction
- artificial intelligence
- hidden markov models
- image processing
- data mining
- neural network
- scientific disciplines
- database
- end to end
- information extraction
- data extraction
- science learning
- scientific literature
- document processing