Dense Reward for Free in Reinforcement Learning from Human Feedback.
Alex J. ChanHao SunSamuel HoltMihaela van der SchaarPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- function approximation
- state space
- model free
- reinforcement learning algorithms
- eligibility traces
- learning algorithm
- partially observable environments
- reward function
- markov decision processes
- machine learning
- optimal policy
- multi agent
- human operators
- temporal difference
- computational models
- average reward
- human subjects
- action selection
- relevance feedback
- dynamic programming
- long run
- learning process
- partially observable
- stereo correspondence
- policy iteration
- learning agent
- action space
- human interaction
- policy search
- user feedback
- behavioural cloning