Sub-Word Embeddings for OCR Corrections in Highly Fusional Indic Languages.
Rohit SalujaMayur PunjabiMark J. CarmanGanesh RamakrishnanParag ChaudhuriPublished in: ICDAR (2019)
Keyphrases
- word recognition
- optical character recognition
- character recognition
- printed documents
- recognition errors
- language independent
- character n grams
- english text
- language specific
- handwriting recognition
- page layout
- document images
- target language
- expressive power
- document image retrieval
- document analysis
- grammar induction
- word level
- n gram
- statistical machine translation
- post processing
- error correction
- spoken language
- bilingual dictionaries
- low dimensional
- co occurrence
- indian languages
- dimensionality reduction
- word order
- arabic documents
- printed text
- compound words
- source language
- manifold learning
- vector space
- machine translation
- preprocessing
- document processing
- language identification
- word segmentation
- text summarization
- text recognition
- cross lingual
- word spotting
- euclidean space
- word sense disambiguation
- keywords