Trainer beware: corpora for language/encoding identification.
Florence ReederPublished in: LREC (1998)
Keyphrases
- parallel corpus
- language learning
- programming language
- data sets
- comparable corpora
- english language
- object oriented programming
- language processing
- object oriented
- natural language
- database
- natural language processing
- modeling language
- wavelet transform
- information extraction
- computational linguistics
- linguistic knowledge
- machine learning
- linguistic patterns