OCR for bilingual documents using language modeling.
Anupama RaySai RajeswarSantanu ChaudhuryPublished in: ICDAR (2015)
Keyphrases
- language modeling
- cross lingual
- comparable corpora
- information retrieval
- language model
- expert finding
- language modeling approaches
- retrieval model
- parallel corpora
- pseudo feedback
- improvements in retrieval effectiveness
- document retrieval
- query expansion
- parallel corpus
- trec collections
- multiword
- relevant documents
- ad hoc information retrieval
- relevance model
- probabilistic model
- language modeling framework
- vector space model
- information retrieval systems
- optical character recognition
- document collections
- language independent
- term frequency
- n gram
- query terms
- text classification
- cross language
- word segmentation
- document analysis
- text retrieval
- expert search
- vector space
- finite state transducers
- search engine
- sentence retrieval
- user queries
- character recognition
- document images
- statistical translation models
- test collection
- retrieval systems
- text documents
- web documents
- query specific
- machine translation
- document clustering
- retrieval effectiveness
- statistical machine translation
- cross language information retrieval
- handwriting recognition
- query translation
- text lines
- inter document similarities
- web search
- information extraction
- metadata