Optical Character Recognition for Degraded Text Documents.
Sudip SanyalKapil Dev DhingraPramod Kumar SharmaPublished in: IMECS (2007)
Keyphrases
- text documents
- optical character recognition
- ocr systems
- text mining
- document images
- character recognition
- text categorization
- text recognition
- text classification
- keywords
- text analysis
- information extraction
- topic models
- wordnet
- document classification
- news articles
- tf idf
- handwriting recognition
- bag of words
- document clustering
- named entities
- textual information
- scanned documents
- printed documents
- databases
- term frequency
- machine vision
- text extraction
- automatic text categorization
- text collections
- image features
- knn
- natural language
- multiscale