Extracting Words and Multi-Part Symbols in Graphics Rich Documents.
Mark James BurgeGladys MonaganPublished in: ICIAP (1995)
Keyphrases
- text documents
- word spotting
- document representation
- word frequencies
- keywords
- related words
- multiword
- document collections
- text corpus
- index terms
- word frequency
- topic hierarchy
- latent topics
- document content
- information retrieval
- stop words
- xml documents
- person names
- information retrieval systems
- text corpora
- semantic relationships
- web documents
- word co occurrence
- linguistic information
- related documents
- word pairs
- semantically related
- keyword extraction
- file formats
- document clustering
- text categorization
- indian languages
- document retrieval
- relevant documents
- document level
- retrieval systems
- document space
- metadata
- distributional clustering
- textual features
- training documents
- historical documents
- printed documents
- multimedia
- arabic documents
- natural language text
- hand drawn
- word sense disambiguation
- text mining
- topic models
- computer graphics
- semantic information
- document images
- automatic text classification
- finite alphabet
- co occurrence
- n gram
- handwritten documents
- bag of words
- query terms
- wikipedia articles
- document analysis
- word segmentation
- training corpus