OCR Synthetic Benchmark Dataset for Indic Languages.
Naresh SainiPromodh PintoAravinth BheemarajDeepak KumarDhiraj DagaSaurabh YadavSrihari NagarajPublished in: CoRR (2022)
Keyphrases
- benchmark datasets
- optical character recognition
- document images
- language independent
- character recognition
- error correction
- post processing
- expressive power
- pedestrian detection
- printed documents
- real world
- real images are presented
- recognition errors
- preprocessing
- text recognition
- information retrieval
- terms of classification accuracy
- spoken language
- language identification
- document image analysis
- document processing
- scanned documents
- word recognition
- english text
- multi lingual
- real scenes
- connected components
- ocr systems
- database systems
- multilingual information retrieval