Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning.
Mathieu RitaFlorian StrubRahma ChaabouniPaul MichelEmmanuel DupouxOlivier PietquinPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- optimization algorithm
- function approximation
- machine learning
- state space
- optimization problems
- global optimization
- learning algorithm
- optimal policy
- reward function
- reinforcement learning algorithms
- constrained optimization
- eligibility traces
- temporal difference
- model free
- optimization process
- optimization method
- multi agent
- finite state
- data sets
- action selection
- learning problems
- transfer learning
- partially observable
- average reward
- monte carlo
- robotic control
- partially observable environments