Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages.
Corinne AarsLauren AdamsXiaokan TianZhaoyu WangColton WismerJason WuPablo RivasKorn SooksatraMatthew FendtPublished in: CoRR (2024)
Keyphrases
- language resources
- cross lingual
- machine translation
- comparable corpora
- cross lingual information retrieval
- cross language information retrieval
- machine translation system
- language independent
- query translation
- parallel corpora
- bilingual dictionaries
- cross language
- statistical machine translation
- language specific
- target language
- multi lingual
- multilingual information retrieval
- natural language generation
- translation model
- parallel corpus
- chinese english
- linguistic resources
- text generation
- multilingual documents
- text documents
- source language
- news articles
- natural language
- training corpus
- multilingual retrieval
- information access
- native language
- word alignment
- cross language retrieval
- english words
- out of vocabulary
- cross language ir
- broadcast news
- domain dependent
- word level
- natural language text
- query terms
- document retrieval
- expressive power
- information extraction
- digital libraries
- word pairs
- indian languages
- databases
- multiword
- word segmentation
- language modeling
- word sense disambiguation
- document collections
- text classification
- natural language processing
- metadata