AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages.
Bonaventure F. P. DossouAtnafu Lambebo TonjaOreen YousufSalomey OseiAbigail OppongIyanuoluwa ShodeOluwabusayo Olufunke AwoyomiChris Chinenye EmezuePublished in: CoRR (2022)
Keyphrases
- language model
- language modeling
- cross lingual
- active learning
- language independent
- n gram
- comparable corpora
- cross lingual information retrieval
- document retrieval
- translation model
- statistical machine translation
- cross language
- information retrieval
- language modelling
- query expansion
- probabilistic model
- retrieval model
- speech recognition
- mixture model
- bilingual dictionaries
- test collection
- transfer learning
- statistical language models
- machine translation system
- learning algorithm
- linguistic resources
- indian languages
- machine learning
- ad hoc information retrieval
- digital libraries
- word segmentation
- parallel corpora
- training set
- relevance model
- query translation
- semi supervised
- context sensitive
- labeled data
- pseudo relevance feedback
- cross language information retrieval
- smoothing methods
- feature selection
- relevance feedback
- cross language retrieval
- query terms
- word clouds
- language model for information retrieval