Impact of OCR Quality on BERT Embeddings in the Domain Classification of Book Excerpts.
Ming JiangYuerong HuGlen WortheyRyan C. DubnicekTed UnderwoodJ. Stephen DowniePublished in: CHR (2021)
Keyphrases
- pattern recognition
- classification accuracy
- preprocessing
- decision trees
- text classification
- cutting edge
- domain specific
- character recognition
- pattern classification
- machine learning methods
- classification scheme
- automatic recognition
- post processing
- feature vectors
- feature selection
- machine learning
- data sets
- optical character recognition
- graduate students
- book covers
- automatic classification
- data mining
- classification models
- researchers and practitioners
- domain experts
- training set
- book presents
- error correction
- document images
- benchmark datasets
- training samples
- face recognition
- text recognition