Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning.
Runze LiuFengshuo BaiYali DuYaodong YangPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- learning algorithm
- eligibility traces
- learning process
- function approximation
- partially observable environments
- state space
- learning agent
- learning problems
- supervised learning
- reinforcement learning algorithms
- markov decision processes
- partially observable
- machine learning
- active learning
- multi agent
- bandit problems
- actor critic
- policy gradient
- state action
- optimal policy
- online learning
- dynamic programming
- temporal difference
- complex domains
- action selection
- prior knowledge
- temporal difference learning
- reinforcement learning methods
- learning systems
- evolutionary learning
- inverse reinforcement learning
- knowledge acquisition
- agent learns
- reward shaping