A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models.
Gurpreet Singh LehalPublished in: MOCR@ICDAR (2013)
Keyphrases
- language model
- cross language retrieval
- language modeling
- cross lingual
- statistical machine translation
- elastic matching
- probabilistic model
- machine translation
- n gram
- translation model
- multiword
- query expansion
- language modelling
- document retrieval
- natural language
- information retrieval
- retrieval model
- cross language
- comparable corpora
- cross language information retrieval
- chinese english
- test collection
- parallel corpora
- language independent
- query translation
- context sensitive
- optical character recognition
- statistical language models
- speech recognition
- indian languages
- document ranking
- parallel corpus
- document images
- smoothing methods
- retrieval effectiveness
- query terms
- spoken term detection
- out of vocabulary
- character recognition
- machine translation system
- pseudo relevance feedback