Off-Policy Evaluation of Ranking Policies under Diverse User Behavior.
Haruka KiyoharaMasatoshi UeharaYusuke NaritaNobuyuki ShimizuYasuo YamamotoYuta SaitoPublished in: CoRR (2023)
Keyphrases
- user behavior
- policy evaluation
- implicit feedback
- optimal policy
- least squares
- policy iteration
- partially observable markov decision processes
- user clicks
- markov decision processes
- temporal difference
- monte carlo
- reinforcement learning
- user interaction
- model free
- user preferences
- ranking algorithm
- document relevance
- markov decision problems
- markov decision process
- ranking functions
- learning to rank
- user feedback
- finite state
- semi parametric
- user behavior patterns
- variance reduction
- web search
- click models
- dynamic programming
- function approximation
- decision problems
- dynamical systems
- average cost
- statistical inference