Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons.
Banghua ZhuMichael I. JordanJiantao JiaoPublished in: ICML (2023)
Keyphrases
- pairwise
- reinforcement learning
- higher order
- markov random field
- loss function
- learning algorithm
- state space
- multi class
- semi supervised
- artificial intelligence
- human subjects
- human operators
- similarity measure
- real time
- markov decision processes
- pairwise interactions
- function approximation
- optimal control
- robotic control
- multi agent
- tutorial dialogue
- computational models
- spectral clustering