An OCR for Classical Indic Documents Containing Arbitrarily Long Words.
Agam DwivediRohit SalujaRavi Kiran SarvadevabhatlaPublished in: CVPR Workshops (2020)
Keyphrases
- printed documents
- word recognition
- optical character recognition
- text documents
- word spotting
- document analysis
- character recognition
- document processing
- text recognition
- document images
- historical documents
- document representation
- keywords
- text lines
- scanned documents
- word frequencies
- recognition errors
- printed text
- semantic relationships
- handwriting recognition
- ocr systems
- document content
- information retrieval
- index terms
- word pairs
- document collections
- page layout
- arabic documents
- text mining
- multiword
- training documents
- information retrieval systems
- topic hierarchy
- latent topics
- document level
- related words
- document clustering
- word co occurrence
- scanned images
- text categorization
- co occurrence
- historical manuscripts
- document image retrieval
- sentiment polarity
- keyword extraction
- word frequency
- xml documents
- text corpus
- text corpora
- retrieval systems
- vector space model
- document retrieval
- relevant documents
- metadata
- web documents
- n gram
- language independent
- word segmentation
- semantically related
- linguistic information
- stop words