Word-Wise Script Identification from Indian Documents.
Suranjit SinhaUmapada PalB. B. ChaudhuriPublished in: Document Analysis Systems (2004)
Keyphrases
- word frequencies
- document collections
- word spotting
- text corpus
- keywords
- web documents
- printed documents
- page layout
- document retrieval
- natural language text
- information retrieval systems
- index terms
- word frequency
- text documents
- term frequency
- information retrieval
- linguistic information
- vector space model
- co occurrence
- stop words
- document analysis
- sentence level
- latent topics
- multiword
- word pairs
- document classification
- word similarity
- relevant documents
- word co occurrence
- sentence similarity
- related words
- spoken documents
- term weighting
- document clustering
- metadata
- training corpus
- concept space
- document space
- n gram
- document representation
- document images
- wordnet
- xml documents
- arabic documents