Evaluating Word Embeddings for Indonesian-English Code-Mixed Text Based on Synthetic Data.
Arra'di Nur RizalSara StymnePublished in: CodeSwitch@LREC (2020)
Keyphrases
- synthetic data
- machine translation
- target language
- statistical machine translation
- english words
- english text
- word level
- computing semantic relatedness
- language resources
- stop words
- binary codes
- parallel corpus
- language specific
- machine translation system
- english chinese
- real image data
- real world
- data sets
- n gram
- word sense disambiguation
- lexical information
- word sense
- co occurrence
- english language
- multiword
- low dimensional
- source language
- unknown words
- language independent
- character n grams
- natural language
- compound words
- cross lingual
- vector space
- word recognition
- language model
- word order
- information retrieval systems
- source code
- training corpus
- answer questions
- mri data
- geographic information retrieval
- indian languages
- dimensionality reduction
- language identification
- natural language processing
- high dimensional
- translation model
- short list
- word segmentation