Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data.
Fahim TajwarAnikait SinghArchit SharmaRafael RafailovJeff SchneiderTengyang XieStefano ErmonChelsea FinnAviral KumarPublished in: CoRR (2024)
Keyphrases
- fine tuning
- data sets
- data collection
- data sources
- synthetic data
- database
- complex data
- data objects
- original data
- data processing
- sensor data
- decision trees
- raw data
- data acquisition
- data analysis
- databases
- input data
- data mining techniques
- knowledge discovery
- missing data
- experimental data
- statistical analysis
- data structure
- data quality
- historical data
- data points