Adapting the Tesseract open source OCR engine for multilingual OCR.

Raymond Smith Daria Antonova Dar-Shyang Lee

Published in: MOCR@ICDAR (2009)

Keyphrases

optical character recognition
open source
document images
character recognition
post processing
text recognition
document processing
error correction
preprocessing
scanned documents
recognition errors
digital libraries
ocr systems
case study
language independent
information retrieval
printed documents
neural network
character segmentation
database
source code
information systems
document analysis
core components
real time