Pre-processing English-Hindi Corpus for Statistical Machine Translation.
Karunesh Kumar AroraShyam S. AgrawalPublished in: Computación y Sistemas (2017)
Keyphrases
- statistical machine translation
- training corpus
- preprocessing
- machine translation
- comparable corpora
- language identification
- link grammar
- cross lingual
- person names
- target language
- mono lingual
- indian languages
- parallel corpus
- cross language information retrieval
- machine translation system
- open domain
- parallel corpora
- source language
- feature extraction
- language model
- text classification
- post processing
- translation model
- chinese english
- broad coverage
- multiword
- proper names
- contextual features
- information extraction
- wide coverage
- part of speech
- query translation
- english language
- english words
- preprocessing step
- text corpora
- sentence pairs
- noun phrases
- word order
- english text
- text documents
- contextual information
- news articles