Realization of a high performance bilingual OCR system for Thai-English printed documents.
Supachai TangwongsanBuntida SuvacharakultonPublished in: NLPKE (2010)
Keyphrases
- printed documents
- language independent
- machine translation
- cross lingual
- cross language
- parallel corpus
- word segmentation
- parallel corpora
- optical character recognition
- document images
- chinese english
- cross language information retrieval
- character recognition
- query translation
- document analysis
- statistical machine translation
- target language
- character segmentation
- word level
- document processing
- source language
- language modeling
- bilingual dictionaries
- text retrieval
- information extraction
- machine translation system
- document image analysis
- natural language
- question answering
- natural language processing
- information access
- n gram
- text classification
- handwritten documents
- document collections
- machine learning
- relevance feedback
- information retrieval systems
- text categorization
- document retrieval
- pattern recognition