Leveraging Text Repetitions and Denoising Autoencoders in OCR Post-correction.
Kai HakalaAleksi VesantoNiko MiekkaTapio SalakoskiFilip GinterPublished in: CoRR (2019)
Keyphrases
- denoising
- text recognition
- image denoising
- noisy images
- printed documents
- error correction
- total variation
- optical character recognition
- noise removal
- wavelet domain
- document processing
- natural images
- image processing
- text extraction
- denoising algorithm
- document analysis
- scanned documents
- information retrieval
- gaussian noise
- ocr systems
- page layout
- printed text
- document images
- character recognition
- scanned images
- text mining
- preprocessing
- text processing
- free text
- text retrieval
- denoising methods
- keywords
- database
- neural network
- recognition errors
- wavelet denoising
- text detection
- web documents
- machine learning
- text lines
- wavelet packet