A Corpus for Multilingual Document Classification in Eight Languages.
Holger SchwenkXian LiPublished in: LREC (2018)
Keyphrases
- document classification
- language independent
- cross lingual
- text classification
- comparable corpora
- parallel corpus
- natural language text
- text documents
- parallel corpora
- text categorization
- machine translation system
- language specific
- statistical machine translation
- text mining
- cross language
- machine translation
- cross language information retrieval
- web documents
- topic extraction
- automatic document classification
- news articles
- query translation
- n gram
- classification algorithm
- word alignment
- machine learning
- bag of words
- knn
- document clustering
- knowledge representation