Building a 50M Corpus of Tajik Language.
Gulshan DovudovJan PomikálekVit SuchomelPavel SmerkPublished in: RASLAN (2011)
Keyphrases
- spanish language
- programming language
- language learning
- language processing
- parallel corpus
- operational semantics
- supervised machine learning
- high level
- natural language
- natural language processing
- test set
- context dependent
- computational linguistics
- manually annotated
- comparable corpora
- probabilistic context free grammars