Cultural Topic Modelling over Novel Wikipedia Corpora for South-Slavic Languages.
Filip MarkoskiElena MarkoskaNikola LjubesicEftim ZdravevskiLjupco KocarevPublished in: RANLP (2021)
Keyphrases
- document corpus
- topic segmentation
- wikipedia articles
- parallel corpora
- text corpora
- linguistic resources
- comparable corpora
- statistical machine translation
- language independent
- knowledge base
- cross cultural
- expressive power
- short texts
- concept space
- topic models
- text corpus
- natural language processing
- wordnet
- cross lingual
- parallel corpus
- target language
- machine translation
- news articles
- text summarization
- semantic relations
- sentence pairs
- information retrieval
- named entity disambiguation
- cultural differences
- n gram
- entity ranking
- semantic information