Script Identification in Printed Bilingual Documents.
D. DhanyaA. G. RamakrishnanPublished in: Document Analysis Systems (2002)
Keyphrases
- music scores
- digital documents
- parallel corpora
- information retrieval
- document collections
- multiword
- xml documents
- web documents
- information retrieval systems
- scanned documents
- machine translation
- cross lingual
- vector space model
- text documents
- metadata
- parallel corpus
- document classification
- document retrieval
- cross language information retrieval
- document clustering
- digital libraries
- relevant documents
- text retrieval
- vector space
- bilingual lexicon
- scanned images
- chinese english
- text classification
- source language
- document analysis
- retrieved documents
- latent semantic analysis
- cross language