MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer.
Ilias ChalkidisManos FergadiotisIon AndroutsopoulosPublished in: EMNLP (1) (2021)
Keyphrases
- document classification
- cross lingual
- multi label
- multi lingual
- text classification
- text categorization
- transfer learning
- language independent
- text documents
- language modeling
- text mining
- naive bayes
- bag of words
- feature selection
- labeled data
- n gram
- knn
- k nearest neighbor
- machine learning
- machine translation
- unsupervised learning
- unlabeled data
- nearest neighbor
- document clustering
- news articles
- classification algorithm
- data mining
- search engine
- information retrieval