Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons.

Banghua Zhu Jiantao Jiao Michael I. Jordan

Published in: CoRR (2023)

Keyphrases

pairwise
reinforcement learning
loss function
human operators
markov random field
multi class
higher order
neural network
belief propagation
multi agent
function approximation
pairwise interactions
relevance feedback
point sets
spectral clustering
graph matching
real time
supervised learning
semi supervised
active learning
transfer learning
machine learning
human interaction
temporal difference
robotic control