Wikipedia as an SMT Training Corpus.
Dan TufisRadu IonStefan Daniel DumitrescuDan StefanescuPublished in: RANLP (2013)
Keyphrases
- training corpus
- statistical machine translation
- machine translation
- text classification
- translation model
- wordnet
- training data
- part of speech
- named entities
- language model
- word alignment
- knowledge base
- machine translation system
- semantic relations
- cross language information retrieval
- parallel corpora
- link structure
- natural language processing
- wikipedia articles
- word sense disambiguation
- target language
- document representation
- semantic information
- document collections
- knowledge discovery
- decision trees