Query-Policy Misalignment in Preference-Based Reinforcement Learning.
Xiao HuJianxiong LiXianyuan ZhanQing-Shan JiaYa-Qin ZhangPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- optimal policy
- response time
- database
- policy search
- query processing
- data structure
- markov decision processes
- policy iteration
- state space
- markov decision process
- action selection
- partially observable
- query formulation
- query evaluation
- decision problems
- reinforcement learning problems
- multi agent
- keywords
- function approximators
- markov decision problems
- policy evaluation
- state and action spaces
- reward function
- partially observable domains
- query terms
- function approximation
- range queries
- query expansion
- dynamic programming
- result set
- temporal difference
- user preferences
- user interaction
- relevance feedback
- data sources
- approximate dynamic programming
- relational databases
- database systems
- learning algorithm