Preference-based Reinforcement Learning with Finite-Time Guarantees.

Yichong Xu Ruosong Wang Lin F. Yang Aarti Singh Artur Dubrawski

Published in: NeurIPS (2020)

Keyphrases

reinforcement learning
state and action spaces
function approximation
state space
markov decision processes
reinforcement learning algorithms
model free
reinforcement learning methods
temporal difference learning
finite number
optimal control
temporal difference
transfer learning
machine learning
data sets
optimal policy
user preferences
active learning
learning environment
learning capabilities
partially observable
function approximators
theoretical guarantees
finite automata
objective function
policy search
data mining