A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing.
Aishik RakshitSamyak MehtaAnirban DasguptaPublished in: CoRR (2023)
Keyphrases
- post processing
- optical character recognition
- natural language processing
- preprocessing
- character recognition
- text recognition
- document images
- ocr systems
- machine learning
- information extraction
- printed documents
- knowledge representation
- page segmentation
- character segmentation
- natural language
- filtering method
- median filtering
- text processing
- handwriting recognition
- real time
- scanned documents
- pattern extraction
- human experts
- text mining
- image binarization
- machine vision
- edge detection
- word spotting
- projections onto convex sets