Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings.
Ewald van der WesthuizenThomas NieslerPublished in: INTERSPEECH (2017)
Keyphrases
- n gram
- training corpus
- english text
- statistical machine translation
- language specific
- word level
- stop words
- english words
- unknown words
- character n grams
- language model
- machine translation
- part of speech
- computing semantic relatedness
- binary codes
- multiword
- word segmentation
- named entities
- lexical information
- parallel corpus
- target language
- co occurrence
- english language
- text classification
- source code
- machine translation system
- natural language
- compound words
- named entity recognizer
- language learning
- translation model
- short list
- natural language processing
- high speed
- bilingual dictionaries
- language identification
- distance measure
- low dimensional
- vector space
- manifold learning
- word sense disambiguation
- cross language
- indian languages
- language independent
- word sense
- parallel corpora
- word recognition
- word order
- cross language information retrieval
- cross lingual
- query translation
- answer questions