Multilingual document language recognition for creating corpora.
Yevgeny LudovikRon ZacharskiPublished in: MTSummit (1999)
Keyphrases
- parallel corpus
- comparable corpora
- multilingual documents
- indian languages
- document images
- document analysis
- recognition rate
- cross lingual
- text documents
- multilingual information retrieval
- language specific
- language resources
- object recognition
- information retrieval
- feature extraction
- linguistic resources
- programming language
- web documents
- document collections
- printed documents
- digital libraries
- language independent
- news articles
- document corpus
- cross language information retrieval
- handwriting recognition
- chinese english
- natural language
- recognition algorithm
- information retrieval systems
- text collections
- text corpora
- parallel corpora
- character recognition
- document clustering
- extensible markup language
- wide coverage
- text summarization
- handwritten documents
- source language
- text corpus
- machine translation system
- text lines
- cross language
- document retrieval
- natural language processing
- face recognition