Preference-Guided Reinforcement Learning for Efficient Exploration.

Guojian Wang Faguo Wu Xiao Zhang Tianyuan Chen Xuyang Chen Lin Zhao

Published in: CoRR (2024)

Keyphrases

reinforcement learning
function approximation
reinforcement learning algorithms
state space
optimal policy
user preferences
robotic control
preference elicitation
markov decision processes
temporal difference
direct policy search
multi agent reinforcement learning
machine learning
temporal difference learning
stochastic approximation
model free
soft constraints
data sets
learning process
multi agent
neural network
real robot
function approximators
learning agents
dynamical systems
learning algorithm
real world
qualitative preferences