Reward Model Ensembles Help Mitigate Overoptimization.
Thomas CosteUsman AnwarRobert KirkDavid KruegerPublished in: CoRR (2023)
Keyphrases
- probabilistic model
- experimental data
- neural network
- machine learning
- multi agent
- cost function
- formal model
- generative model
- computational model
- learning models
- classification models
- neural network model
- conceptual model
- hierarchical structure
- mathematical model
- theoretical analysis
- management system
- high level
- case study