Line and Word Matching in Old Documents
A. MarcolinoVitorino RamosMário RamalhoJoão Rogério Caldas PintoPublished in: CoRR (2004)
Keyphrases
- word spotting
- word frequencies
- string matching
- keywords
- text corpus
- matching algorithm
- information retrieval
- word frequency
- co occurrence
- spoken documents
- document retrieval
- latent topics
- sentence level
- information retrieval systems
- pattern matching
- index terms
- natural language text
- metadata
- printed documents
- term frequency
- web documents
- page layout
- document clustering
- word similarity
- xml documents
- sentence similarity
- multiword
- training corpus
- related words
- related documents
- word co occurrence
- document images
- retrieval systems
- multi document summarization
- relevant documents
- word pairs
- text lines
- linguistic information
- text documents
- spoken document retrieval
- indian languages
- stop words
- n gram
- document space
- concept space
- word segmentation
- similarity scores
- arabic documents
- document analysis