AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs.
Feiyang KangYifan SunBingbing WenSi ChenDawn SongRafid MahmoodRuoxi JiaPublished in: CoRR (2024)
Keyphrases
- data sets
- high quality
- data analysis
- data processing
- complex data
- data mining techniques
- data sources
- prior knowledge
- historical data
- data quality
- data distribution
- missing data
- sensor data
- semi automatic
- labelled data
- correlation analysis
- database
- data objects
- missing values
- synthetic data
- training samples
- data collection
- small number
- knowledge discovery
- probability distribution
- data mining
- neural network