A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages.
Dwaipayan RoySumit BhatiaPrateek JainPublished in: LREC (2020)
Keyphrases
- wikipedia articles
- comparable corpora
- parallel corpora
- information asymmetry
- language independent
- semantic relatedness
- cross lingual
- semantic features
- search queries
- text corpora
- link structure
- news articles
- user generated content
- cross language information retrieval
- cross language
- language modeling
- machine translation
- digital libraries
- low level features
- statistical machine translation
- wordnet
- social media