Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR).
Musa Dildar Ahmed CheemaMohammad Daniyal ShaiqFarhaan MirzaAli KamalM. Asif NaeemPublished in: PeerJ Comput. Sci. (2024)
Keyphrases
- optical character recognition
- handwriting recognition
- character recognition
- english text
- text recognition
- document images
- ocr systems
- character n grams
- language resources
- multi lingual
- character segmentation
- image binarization
- page segmentation
- computer vision
- natural language
- real time
- image processing
- printed documents
- language specific
- sentiment analysis
- historical manuscripts
- word spotting
- vision system
- scanned documents
- cross language information retrieval
- text regions
- digital libraries
- text extraction
- cross lingual
- text processing
- language identification