Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards.
Wei ShenXiaoying ZhangYuanshun YaoRui ZhengHongyi GuoYang LiuPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- markov decision processes
- function approximation
- state space
- machine learning
- model free
- reward signal
- learning process
- relevance feedback
- optimal policy
- action space
- reinforcement learning algorithms
- human operators
- learning algorithm
- bandit problems
- temporal difference
- human behavior
- learning problems
- decision problems
- dynamical systems
- reward function
- human activities
- multi agent
- robotic control
- decision making