Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning.
Yinglun XuDavid ZhuRohan GumasteGagandeep SinghPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- average reward
- total reward
- dynamic programming
- control policy
- optimal control
- function approximation
- reward function
- markov decision processes
- reinforcement learning algorithms
- optimal policy
- state space
- eligibility traces
- image segmentation
- worst case
- long run
- partially observable environments
- unsupervised learning
- initially unknown
- approximate dynamic programming
- temporal difference
- learning algorithm
- model free
- multi armed bandit
- learning agent
- action selection
- real time
- multi agent
- machine learning
- partially observable
- reinforcement learning methods
- policy gradient
- multi criteria
- transfer learning
- reward shaping