Reward Uncertainty for Exploration in Preference-based Reinforcement Learning.
Xinran LiangKatherine ShuKimin LeePieter AbbeelPublished in: ICLR (2022)
Keyphrases
- reinforcement learning
- exploration strategy
- action selection
- exploration exploitation
- function approximation
- state space
- partial observability
- active exploration
- multi agent
- model based reinforcement learning
- reinforcement learning algorithms
- partially observable
- reward function
- temporal difference
- eligibility traces
- machine learning
- model free
- markov decision processes
- learning algorithm
- optimal policy
- exploration exploitation tradeoff
- conditional probabilities
- partially observable environments
- transfer learning
- neural network
- dynamic programming
- supervised learning
- policy gradient
- average reward
- decision theory
- reinforcement learning methods
- policy search
- markov decision process
- reward shaping
- dynamical systems
- learning capabilities
- expected utility
- uncertain data
- balancing exploration and exploitation
- belief functions