Off-Policy Evaluation of Ranking Policies under Diverse User Behavior.
Haruka KiyoharaMasatoshi UeharaYusuke NaritaNobuyuki ShimizuYasuo YamamotoYuta SaitoPublished in: KDD (2023)
Keyphrases
- user behavior
- policy evaluation
- implicit feedback
- optimal policy
- least squares
- policy iteration
- partially observable markov decision processes
- user clicks
- user interaction
- monte carlo
- markov decision processes
- temporal difference
- reinforcement learning
- ranking algorithm
- user preferences
- model free
- decision problems
- document relevance
- user browsing
- markov decision problems
- ranking functions
- user feedback
- web search
- semi parametric
- function approximation
- variance reduction
- click logs
- user behavior patterns
- finite state
- learning to rank
- fixed point
- infinite horizon
- reinforcement learning algorithms
- link analysis
- evaluation measures
- ranked list
- dynamical systems
- sufficient conditions
- markov chain
- click models
- linear programming
- state space
- dynamic programming