Dueling Posterior Sampling for Preference-Based Reinforcement Learning.
Ellen R. NovosellerYibing WeiYanan SuiYisong YueJoel BurdickPublished in: UAI (2020)
Keyphrases
- reinforcement learning
- markov chain monte carlo
- metropolis hastings
- posterior distribution
- function approximation
- sampling strategy
- reinforcement learning algorithms
- random sampling
- learning algorithm
- model free
- machine learning
- sampling algorithm
- posterior probability
- state space
- optimal policy
- markov decision processes
- probability distribution
- dynamic programming
- temporal difference
- data sets
- policy search
- sampling strategies
- parameter estimation
- temporal difference learning
- multi agent
- bayesian networks
- active learning
- proposal distribution
- sampled data
- learning process
- probabilistic model
- generative model
- monte carlo
- user preferences