LAPCA: Language-Agnostic Pretraining with Cross-Lingual Alignment.
Dmitry AbulkhanovNikita SorokinSergey NikolenkoValentin MalykhPublished in: SIGIR (2023)
Keyphrases
- cross lingual
- parallel corpus
- word alignment
- european languages
- language specific
- machine translation
- monolingual and cross lingual
- indian languages
- linguistic resources
- cross lingual information retrieval
- cross language
- language independent
- comparable corpora
- language modeling
- source language
- machine translation system
- target language
- event extraction
- bilingual dictionaries
- natural language
- news articles
- transfer learning
- document clustering
- text classification
- parallel corpora
- statistical machine translation
- translation model
- computational linguistics
- data mining
- query translation
- text documents
- language model
- machine learning