Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles.
Yuanzhao ZhaiHan ZhangYu LeiYue YuKele XuDawei FengBo DingHuaimin WangPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- agent technology
- state space
- machine learning
- reinforcement learning algorithms
- learning algorithm
- model free
- function approximation
- markov decision processes
- multi agent
- decision trees
- reward function
- human subjects
- learning process
- temporal difference
- human operators
- reinforcement learning methods
- optimal policy
- partially observable environments
- uncertain data
- eligibility traces
- motor skills
- reward signal
- fuzzy logic
- policy gradient
- average reward
- learning agent
- learning capabilities
- decision making
- ensemble methods
- dynamic programming
- electronic commerce