OCR Synthetic Benchmark Dataset for Indic Languages.

Naresh Saini Promodh Pinto Aravinth Bheemaraj Deepak Kumar Dhiraj Daga Saurabh Yadav Srihari Nagaraj

Published in: CoRR (2022)

Keyphrases

benchmark datasets
optical character recognition
document images
language independent
character recognition
error correction
post processing
expressive power
pedestrian detection
printed documents
real world
real images are presented
recognition errors
preprocessing
text recognition
information retrieval
terms of classification accuracy
spoken language
language identification
document image analysis
document processing
scanned documents
word recognition
english text
multi lingual
real scenes
connected components
ocr systems
database systems
multilingual information retrieval