Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs.
Feiyang KangHoang Anh JustYifan SunHimanshu JahagirdarYuanzhi ZhangRongxing DuAnit Kumar SahuRuoxi JiaPublished in: CoRR (2024)
Keyphrases
- fine tuning
- data sets
- image data
- database
- data processing
- data collection
- data sources
- raw data
- data distribution
- statistical analysis
- synthetic data
- complex data
- relational databases
- small number
- high quality
- prior knowledge
- information sources
- computer systems
- information systems
- statistical methods
- missing data
- genetic algorithm
- data mining techniques
- domain specific
- data points
- xml documents