Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF.
Banghua ZhuMichael I. JordanJiantao JiaoPublished in: CoRR (2024)
Keyphrases
- data sets
- raw data
- statistical analysis
- data processing
- high quality
- experimental data
- data sources
- data collection
- database
- complex data
- training data
- original data
- sensor data
- computer systems
- image data
- data points
- end users
- wireless sensor networks
- data analysis
- data mining
- labeled data
- synthetic data
- privacy preserving
- missing data
- spatial data
- high dimensional
- neural network
- data quality
- support vector