Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction.
Laura Manrique-GómezTony MontesRubén ManriquePublished in: CoRR (2024)
Keyphrases
- latin american
- spanish language
- error correction
- optical character recognition
- document images
- manually annotated
- preprocessing
- character recognition
- post processing
- historical manuscripts
- supervised machine learning
- machine translation system
- recognition errors
- historical documents
- st century
- text recognition
- ocr systems
- error detection
- question answering
- george washington
- handwriting recognition
- open domain
- document processing
- co occurrence
- hidden markov models