MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling.
Tomasz LimisiewiczTerra BlevinsHila GonenOrevaoghene AhiaLuke ZettlemoyerPublished in: CoRR (2024)
Keyphrases
- language modeling
- cross lingual
- language model
- information retrieval
- comparable corpora
- retrieval model
- query expansion
- probabilistic model
- n gram
- language independent
- text classification
- cross language
- statistical language models
- sentence retrieval
- statistical language modeling
- document retrieval
- relevance model
- improvements in retrieval effectiveness
- language modeling approaches
- test collection
- digital libraries
- information retrieval systems