Using Transfer Learning to contextually Optimize Optical Character Recognition (OCR) output and perform new Feature Extraction on a digitized cultural and historical dataset.
Aravind InbasekaranRajesh Kumar GnanasekaranRichard MarcianoPublished in: IEEE BigData (2021)
Keyphrases
- optical character recognition
- transfer learning
- historical manuscripts
- feature extraction
- ocr systems
- character recognition
- knowledge transfer
- text recognition
- document images
- cross domain
- labeled data
- word spotting
- reinforcement learning
- character segmentation
- machine learning algorithms
- active learning
- semi supervised learning
- feature set
- machine learning
- scanned documents
- printed documents
- page segmentation
- transfer knowledge
- handwritten document images
- text classification
- collaborative filtering
- learning algorithm
- cross lingual
- handwriting recognition
- feature vectors
- feature space
- printed text
- image processing
- speech recognition
- unlabeled data
- text mining
- feature selection