MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer.
Ilias ChalkidisManos FergadiotisIon AndroutsopoulosPublished in: CoRR (2021)
Keyphrases
- document classification
- cross lingual
- multi label
- multi lingual
- text classification
- text categorization
- transfer learning
- language independent
- text documents
- text mining
- labeled data
- bag of words
- n gram
- naive bayes
- language modeling
- machine learning
- knn
- semi supervised learning
- feature selection
- news articles
- unlabeled data
- machine translation
- document clustering
- feature set
- k nearest neighbor
- neural network
- feature space
- unsupervised learning
- generative model
- data analysis