A Language Modelling Approach to Quality Assessment of OCR'ed Historical Text.
Callum BoothRobert ShoemakerRobert J. GaizauskasPublished in: LREC (2022)
Keyphrases
- quality assessment
- language modelling
- language model
- n gram
- image quality
- information retrieval
- text retrieval
- optical character recognition
- image quality assessment
- data quality
- text mining
- text documents
- document ranking
- ad hoc retrieval
- probabilistic model
- pseudo relevance feedback
- multimedia
- keywords
- weighting scheme
- tf idf
- search engine
- visual quality
- document images
- language modeling
- high quality
- text classification
- document collections
- web documents
- databases