Tamizhi-Net OCR: Creating A Quality Large Scale Tamil-Sinhala-English Parallel Corpus Using Deep Learning Based Printed Character Recognition (PCR).
Charangan VasantharajanUthayasanker ThayasivamPublished in: CoRR (2021)
Keyphrases
- character recognition
- parallel corpus
- optical character recognition
- deep learning
- handwritten characters
- cross lingual
- machine vision
- word recognition
- machine translation
- chinese characters
- cross language information retrieval
- statistical machine translation
- language independent
- handwriting recognition
- unsupervised learning
- query translation
- character segmentation
- word alignment
- printed documents
- machine translation system
- machine learning
- indian languages
- target language
- word level
- document analysis
- weakly supervised
- feature selection
- source language
- parallel corpora
- cross language
- conditional random fields
- graph cuts