Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons.
Banghua ZhuJiantao JiaoMichael I. JordanPublished in: CoRR (2023)
Keyphrases
- pairwise
- reinforcement learning
- loss function
- human operators
- markov random field
- multi class
- higher order
- neural network
- belief propagation
- multi agent
- function approximation
- pairwise interactions
- relevance feedback
- point sets
- spectral clustering
- graph matching
- real time
- supervised learning
- semi supervised
- active learning
- transfer learning
- machine learning
- human interaction
- temporal difference
- robotic control