CroissantLLM: A Truly Bilingual French-English Language Model.
Manuel FayssePatrick FernandesNuno M. GuerreiroAntónio LoisonDuarte M. AlvesCaio CorroNicolas BoizardJoão AlvesRicardo ReiPedro Henrique MartinsAntoni Bigata CasademuntFrançois YvonAndré F. T. MartinsGautier ViaudCéline HudelotPierre ColomboPublished in: CoRR (2024)
Keyphrases
- language model
- cross language retrieval
- monolingual retrieval
- language modeling
- cross lingual
- statistical machine translation
- cross language
- translation model
- document retrieval
- chinese english
- comparable corpora
- n gram
- multiword
- machine translation
- retrieval model
- cross language information retrieval
- multilingual retrieval
- word alignment
- query translation
- probabilistic model
- parallel corpus
- information retrieval
- language modelling
- document collections
- speech recognition
- query terms
- parallel corpora
- language independent
- query expansion
- test collection
- out of vocabulary
- mixture model
- source language
- cross lingual information retrieval
- ad hoc information retrieval
- bilingual dictionaries
- context sensitive
- retrieval effectiveness
- dublin city university
- smoothing methods
- relevance model
- feature selection