Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs.
Feiyang KangHoang Anh JustYifan SunHimanshu JahagirdarYuanzhi ZhangRongxing DuAnit Kumar SahuRuoxi JiaPublished in: ICLR (2024)
Keyphrases
- fine tuning
- data sets
- database
- original data
- raw data
- data structure
- data collection
- statistical analysis
- computer systems
- knowledge discovery
- end users
- data analysis
- prior knowledge
- neural network
- big data
- data quality
- machine learning
- application domains
- domain experts
- databases
- data points
- training data
- high dimensional data
- synthetic data
- missing data
- general purpose
- experimental data
- data distribution
- multimedia data
- case study
- image data
- xml documents
- data model