A prototype gutenberg-hathitrust sentence-level parallel corpus for OCR error analysis: pilot investigations.
Ming JiangRyan C. DubnicekGlen WortheyTed UnderwoodJ. Stephen DowniePublished in: JCDL (2022)
Keyphrases
- error analysis
- sentence level
- parallel corpus
- error correction
- parallel corpora
- cross lingual
- sentiment analysis
- sentiment classification
- least squares
- novelty detection
- multi document summarization
- machine translation
- query translation
- document images
- cross language information retrieval
- statistical machine translation
- target language
- relevance model
- language modeling
- natural language
- co occurrence
- semi supervised