Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings.
Naoki OtaniSatoru OzakiXingyuan ZhaoYucen LiMicaelah St JohnsLori S. LevinPublished in: EMNLP (1) (2020)
Keyphrases
- multiword
- cross lingual
- language modeling
- language model
- bilingual dictionaries
- n gram
- context sensitive
- machine translation
- statistical machine translation
- translation model
- cross language
- language independent
- character n grams
- document representation
- text clustering
- vector space
- document clustering
- part of speech
- text classification
- named entities
- query translation
- natural language
- information retrieval
- machine learning
- retrieval model
- low dimensional
- dimensionality reduction
- co occurrence
- probabilistic model
- machine translation system
- parallel corpora
- query terms
- web documents