PHD: Pixel-Based Language Modeling of Historical Documents.
Nadav BorensteinPhillip RustDesmond ElliottIsabelle AugensteinPublished in: EMNLP (2023)
Keyphrases
- language modeling
- historical documents
- language model
- handwriting recognition
- word segmentation
- word recognition
- retrieval model
- information retrieval
- query expansion
- document images
- n gram
- probabilistic model
- speech recognition
- cross lingual
- historical manuscripts
- text classification
- character recognition
- text retrieval
- document analysis
- question answering
- text mining
- knowledge discovery