Leveraging Statistical Transliteration for Dictionary-Based English-Bengali CLIR of OCR'd Text.
Utpal GarainArjun DasDavid S. DoermannDouglas W. OardPublished in: COLING (Posters) (2012)
Keyphrases
- cross language information retrieval
- cross language
- text retrieval
- machine translation system
- query translation
- machine translation
- statistical machine translation
- machine transliteration
- text recognition
- english chinese
- printed documents
- language independent
- document processing
- document retrieval
- document collections
- optical character recognition
- question answering
- information retrieval
- document images
- language resources
- parallel corpora
- chinese english
- parallel corpus
- document analysis
- word level
- comparable corpora
- information access
- translation model
- source language
- text categorization
- cross language retrieval
- linguistic resources
- page layout
- cross lingual
- bilingual dictionaries
- character recognition
- text lines
- multilingual information retrieval
- scanned images
- word alignment
- text mining
- natural language processing
- scanned documents
- target language
- text collections
- information retrieval systems
- monolingual retrieval
- english text
- string matching
- indian languages
- query terms
- text documents
- named entities
- image retrieval