A Novel Machine Learning Based Approach for Post-OCR Error Detection.
Shafqat Mumtaz VirkDana DannéllsAzam Sheikh MuhammadPublished in: RANLP (2021)
Keyphrases
- error detection
- error correction
- machine learning
- error recovery
- error correcting
- data cleansing
- optical character recognition
- machine learning algorithms
- fault tolerance
- document images
- machine learning methods
- error control
- learning algorithm
- feature selection
- preprocessing
- fault isolation
- post processing
- error resilient
- data mining
- artificial intelligence
- watermarking scheme
- information extraction
- document analysis
- decision trees
- channel coding
- natural language processing
- character recognition
- text mining
- response time
- text classification