OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models.
Nikolay BogoychevJelmer van der LindeGraeme NailBarry HaddowJaume Zaragoza-BernabeuGema Ramírez-SánchezLukas WeymannTudor Nicolae MateiuJindrich HelclMikko AulamoPublished in: CoRR (2023)
Keyphrases
- language model
- machine translation
- statistical machine translation
- language modeling
- cross lingual
- n gram
- document retrieval
- speech recognition
- retrieval model
- natural language processing
- cross language information retrieval
- information extraction
- language independent
- target language
- test collection
- probabilistic model
- machine translation system
- information retrieval
- translation model
- language processing
- chinese english
- cross language retrieval
- relevance model
- query terms
- out of vocabulary
- query expansion
- natural language
- pseudo relevance feedback
- parallel corpora
- word level
- sentence retrieval
- context sensitive
- ad hoc information retrieval
- vector space model
- document level
- bayesian networks
- source language
- cross language
- bag of words