An Open Vocabulary OCR System with Hybrid Word-Subword Language Models.
Meng CaiWenping HuKai ChenLei SunSen LiangXiongjian MoQiang HuoPublished in: ICDAR (2017)
Keyphrases
- language model
- out of vocabulary
- n gram
- spoken document retrieval
- language modeling
- probabilistic model
- language modelling
- document retrieval
- speech recognition
- spoken term detection
- retrieval model
- information retrieval
- context sensitive
- query expansion
- query terms
- optical character recognition
- ad hoc information retrieval
- test collection
- translation model
- document images
- vector space model
- document ranking
- broadcast news
- word segmentation
- character recognition
- pseudo relevance feedback
- smoothing methods
- co occurrence
- word error rate
- improve retrieval effectiveness
- language models for information retrieval
- language independent
- cross language information retrieval
- statistical language modeling
- bag of words