Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval.
Nandan ThakurJianmo NiGustavo Hernández ÁbregoJohn WietingJimmy LinDaniel CerPublished in: NAACL-HLT (2024)
Keyphrases
- monolingual retrieval
- cross lingual
- training data
- query translation
- cross language information retrieval
- cross lingual information retrieval
- machine translation
- language independent
- multilingual retrieval
- test data
- learning algorithm
- multi lingual
- data sets
- language resources
- supervised learning
- indian languages
- prior knowledge
- training set
- training examples
- language specific
- text classification
- decision trees
- classification accuracy
- training process
- translation model
- machine translation system
- training samples
- relevance feedback
- test collection
- image database
- multilingual documents
- active learning
- text retrieval
- information extraction