Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification.
Aditya MogadalaAchim RettingerPublished in: HLT-NAACL (2016)
Keyphrases
- parallel corpora
- cross language
- text classification
- cross lingual
- text categorization
- n gram
- bilingual dictionaries
- language independent
- bilingual lexicon
- out of vocabulary
- parallel corpus
- cross language information retrieval
- sentence pairs
- word pairs
- translation model
- comparable corpora
- machine translation system
- statistical machine translation
- word segmentation
- word alignment
- sentence level
- text retrieval
- bag of words
- document retrieval
- cross lingual information retrieval
- sentiment analysis
- query translation
- sentiment classification
- machine learning
- text mining
- question answering
- text classifiers
- feature selection
- cross language retrieval
- language modeling
- chinese english
- text documents
- document collections
- semantic features
- machine translation
- labeled data
- k nearest neighbor
- co occurrence
- vector space
- information access
- source language
- unlabeled data
- knn
- information retrieval
- linguistic resources
- language model
- semantic similarity
- keywords