Adapting the Tesseract open source OCR engine for multilingual OCR.
Raymond SmithDaria AntonovaDar-Shyang LeePublished in: MOCR@ICDAR (2009)
Keyphrases
- optical character recognition
- open source
- document images
- character recognition
- post processing
- text recognition
- document processing
- error correction
- preprocessing
- scanned documents
- recognition errors
- digital libraries
- ocr systems
- case study
- language independent
- information retrieval
- printed documents
- neural network
- character segmentation
- database
- source code
- information systems
- document analysis
- core components
- real time