Countering Reward Over-Optimization in LLM with Demonstration-Guided Reinforcement Learning.
Mathieu RitaFlorian StrubRahma ChaabouniPaul MichelEmmanuel DupouxOlivier PietquinPublished in: ACL (Findings) (2024)
Keyphrases
- reinforcement learning
- optimization algorithm
- state space
- function approximation
- partially observable environments
- model free
- global optimization
- optimal control
- optimization methods
- optimization problems
- optimization method
- learning problems
- eligibility traces
- multi agent reinforcement learning
- learning algorithm
- optimal policy
- function approximators
- machine learning
- reward function
- reinforcement learning algorithms
- optimization process
- neural network
- constrained optimization
- hidden markov models
- data sets
- transfer learning