Query-Policy Misalignment in Preference-Based Reinforcement Learning.
Xiao HuJianxiong LiXianyuan ZhanQing-Shan JiaYa-Qin ZhangPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- optimal policy
- query processing
- response time
- database
- policy search
- action selection
- relevance feedback
- query evaluation
- keywords
- markov decision process
- query expansion
- function approximation
- partially observable environments
- control policies
- markov decision processes
- user queries
- database queries
- data sources
- policy iteration
- function approximators
- data structure
- query formulation
- state space
- reward function
- partially observable
- action space
- infinite horizon
- search engine
- learning algorithm