Researching Less-Resourced Languages - the DigiSami Corpus.
Kristiina JokinenPublished in: LREC (2018)
Keyphrases
- statistical machine translation
- expressive power
- language independent
- spoken dialog
- databases
- test set
- manually annotated
- supervised machine learning
- data sets
- language identification
- target language
- query translation
- noun phrases
- language modeling
- first order logic
- machine translation system
- co occurrence
- xml documents
- parallel corpora
- english text
- parallel corpus
- arabic language
- database
- sentence pairs