A Word Association Based Approach for Improving Retrieval Performance from Noisy OCRed Text.
Anirban ChakrabortyKripabandhu GhoshUtpal RoyPublished in: KDIR (2014)
Keyphrases
- english text
- text input
- sentence level
- word counts
- keywords
- related words
- information retrieval
- printed documents
- natural language text
- english words
- word pairs
- text mining
- co occurrence
- text corpus
- text segments
- multiword
- word level
- string matching
- syntactic categories
- text retrieval
- sentence similarity
- linguistic information
- chinese text
- printed text
- lexical features
- word recognition
- lexical information
- historical manuscripts
- page layout
- text corpora
- word sense
- compound words
- noun phrases
- handwritten words
- spoken documents
- punctuation marks
- word co occurrence
- stop words
- word frequency
- syntactic information
- text documents
- training corpus
- noisy environments
- handwriting recognition
- word segmentation
- natural language generation