Generating Correction Candidates for OCR Errors using BERT Language Model and FastText SubWord Embeddings.
Mahdi HajialiJorge Ramón Fonseca CachoKazem TaghvaPublished in: SAI (1) (2021)
Keyphrases
- language model
- n gram
- speech recognition
- language modeling
- out of vocabulary
- error correction
- document retrieval
- probabilistic model
- information retrieval
- language modelling
- optical character recognition
- query expansion
- test collection
- vector space
- statistical language models
- retrieval model
- dimensionality reduction
- smoothing methods
- mixture model
- translation model
- context sensitive
- document images
- language independent
- language models for information retrieval
- ad hoc information retrieval
- low dimensional
- statistical machine translation
- word clouds
- statistical language modeling
- language model for information retrieval
- word segmentation
- relevance model
- pseudo relevance feedback
- vector space model
- query terms