CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset.
Zheng ChenHongyu LinPublished in: LREC (2022)
Keyphrases
- cross lingual
- multi lingual
- mono lingual
- language modeling
- machine translation
- word sense
- cross lingual information retrieval
- event extraction
- language independent
- text summarization
- text classification
- machine translation system
- indian languages
- information retrieval
- cross language
- text documents
- news articles
- translation model
- open domain
- parallel corpus
- transfer learning
- text mining
- language model
- document clustering
- keywords
- text retrieval
- document retrieval
- web documents