From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning.
Ming LiYong ZhangZhitao LiJiuhai ChenLichang ChenNing ChengJianzong WangTianyi ZhouJing XiaoPublished in: CoRR (2023)
Keyphrases
- multi class
- high quality
- data sets
- base classifiers
- image data
- low quality
- database
- complex data
- noisy data
- original data
- raw data
- sensor data
- statistical analysis
- data processing
- data points
- data sources
- small number
- training data
- training samples
- synthetic data
- data quality
- data distribution
- prior knowledge
- data analysis
- learning algorithm
- knowledge discovery
- search engine
- high dimensional
- experimental data
- probability distribution
- model selection
- data collection