Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons.

Banghua Zhu Michael I. Jordan Jiantao Jiao

Published in: ICML (2023)

Keyphrases

pairwise
reinforcement learning
higher order
markov random field
loss function
learning algorithm
state space
multi class
semi supervised
artificial intelligence
human subjects
human operators
similarity measure
real time
markov decision processes
pairwise interactions
function approximation
optimal control
robotic control
multi agent
tutorial dialogue
computational models
spectral clustering