PHD: Pixel-Based Language Modeling of Historical Documents.
Nadav BorensteinPhillip RustDesmond ElliottIsabelle AugensteinPublished in: CoRR (2023)
Keyphrases
- language modeling
- historical documents
- language model
- handwriting recognition
- word recognition
- word segmentation
- information retrieval
- document images
- query expansion
- retrieval model
- n gram
- probabilistic model
- historical manuscripts
- cross lingual
- text classification
- speech recognition
- test collection
- document retrieval
- relevance model
- machine learning
- hidden markov models
- bayesian networks