A Corpus for Multilingual Document Classification in Eight Languages.
Holger SchwenkXian LiPublished in: CoRR (2018)
Keyphrases
- document classification
- language independent
- cross lingual
- text classification
- comparable corpora
- parallel corpus
- natural language text
- text documents
- language specific
- statistical machine translation
- text categorization
- machine translation system
- parallel corpora
- cross language information retrieval
- text mining
- cross language
- web documents
- classification algorithm
- topic extraction
- word alignment
- machine translation
- n gram
- news articles
- query translation
- bag of words
- digital libraries
- target language
- databases
- topic models
- k nearest neighbor
- natural language processing
- image features
- keywords
- real world