Mapping Transformer Leveraged Embeddings for Cross-Lingual Document Representation.
Tsegaye Misikir TashuEduard-Raul KontosMatthia SabatelliMatias Valdenegro-ToroPublished in: CoRR (2024)
Keyphrases
- cross lingual
- document representation
- document clustering
- language modeling
- vector space
- machine translation
- language model
- bag of words
- text classification
- dimensionality reduction
- information retrieval
- news articles
- text documents
- transfer learning
- vector space model
- data fusion
- document collections
- n gram
- machine learning