Evaluating OCR and Non-OCR Text Representations for Learning Document Classifiers.
Markus JunkerRainer HochPublished in: ICDAR (1997)
Keyphrases
- document processing
- document images
- printed documents
- information retrieval
- document analysis
- optical character recognition
- text documents
- scanned documents
- web documents
- post processing
- text recognition
- learning algorithm
- learning tasks
- error correction
- text mining
- learning process
- training set
- preprocessing
- reinforcement learning
- training data
- decision trees
- ocr systems
- scanned images
- feature representations
- document retrieval
- semantic information
- query expansion
- supervised learning
- active learning