Improved shape code based word matching for multi-script documents.
Tanmoy MondalArundhati TarafdarNicolas RagotJean-Yves RamelUmapada PalPublished in: ACPR (2015)
Keyphrases
- word frequencies
- indian languages
- word spotting
- shape matching
- information retrieval
- text corpus
- keywords
- document collections
- shape recognition
- xml documents
- short list
- string matching
- multiword
- matching algorithm
- relevant documents
- text documents
- shape model
- arabic documents
- information retrieval systems
- related words
- word pairs
- shape descriptors
- metadata
- latent topics
- free form objects
- natural language text
- pattern matching
- image matching
- document clustering
- shape analysis
- cross lingual
- sentence level
- document analysis
- spoken documents
- document retrieval
- linguistic information
- related documents
- web documents
- stop words
- page layout
- word co occurrence
- co occurrence
- text lines
- source code
- n gram
- multi document summarization
- document images
- printed documents
- term frequency
- concept space
- text corpora