Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback.
Teng XiaoSuhang WangPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- user feedback
- policy search
- state space
- markov decision process
- human operators
- learning to rank
- control policy
- reward function
- implicit feedback
- control policies
- learning algorithm
- multi agent reinforcement learning
- reinforcement learning agents
- creative problem solving
- policy gradient methods
- function approximation
- ranking algorithm
- relevance feedback
- temporal difference
- action selection
- human behavior
- ranking functions
- human judgments
- decision problems
- markov decision problems
- web search
- multi agent
- machine learning
- neural network