Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?
Xiang ZhangYann LeCunPublished in: CoRR (2017)
Keyphrases
- text classification
- chinese english
- machine translation
- machine translation system
- cross language information retrieval
- wordnet
- linguistic resources
- out of vocabulary
- bag of words
- cross lingual
- statistical machine translation
- text categorization
- term frequency
- n gram
- word segmentation
- labeled data
- text mining
- sentiment analysis
- feature selection
- text data
- text documents
- text classifiers
- machine learning
- linguistic features
- language independent
- translation model
- unlabeled data
- knn
- parallel corpora
- semantic similarity
- sentiment classification
- active learning
- artificial intelligence