A Benchmark and Dataset for Post-OCR text correction in Sanskrit.
Ayush MaheshwariNikhil SinghAmrith KrishnaGanesh RamakrishnanPublished in: CoRR (2022)
Keyphrases
- text recognition
- printed documents
- optical character recognition
- error correction
- post processing
- document processing
- document analysis
- text extraction
- document images
- database
- character recognition
- information retrieval
- page layout
- text mining
- finite state automata
- scanned documents
- text lines
- ocr systems
- printed text
- preprocessing
- free text
- text documents
- benchmark datasets
- web documents
- scanned images
- text analysis
- action recognition
- n gram
- topic models
- video sequences
- machine learning
- real world