Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval.
Nandan ThakurJianmo NiGustavo Hernández ÁbregoJohn WietingJimmy LinDaniel CerPublished in: CoRR (2023)
Keyphrases
- multilingual information retrieval
- training data
- language independent
- cross lingual
- multi lingual
- indian languages
- data sets
- language specific
- multilingual documents
- training set
- cross lingual information retrieval
- databases
- query expansion
- digital libraries
- information retrieval systems
- language resources
- federated search
- image database
- machine translation
- learning algorithm
- multilingual retrieval
- heterogeneous collections
- prior knowledge
- expressive power
- language identification
- document retrieval
- information retrieval
- training samples
- supervised learning
- cross language
- handwritten documents
- decision trees
- retrieval model
- training process
- natural language
- classification models
- image retrieval
- retrieval systems
- test collection
- test set
- multilingual search
- comparable corpora
- cross language information retrieval
- text retrieval
- test data
- document images
- class labels
- training examples
- machine learning