Hands-on Reinforcement Learning for Recommender Systems - From Bandits to SlateQ to Offline RL with Ray RLlib.
Christy D. BergmanKourosh HakhamaneshiPublished in: RecSys (2022)
Keyphrases
- reinforcement learning
- recommender systems
- collaborative filtering
- multi armed bandit
- state space
- reinforcement learning algorithms
- rl algorithms
- function approximation
- matrix factorization
- user preferences
- multi armed bandits
- markov decision processes
- control problems
- learning process
- stochastic systems
- supervised learning
- user profiles
- action selection
- model free
- optimal policy
- temporal difference
- machine learning
- learning problems
- optimal control
- temporal difference learning
- cold start problem
- recommendation systems
- transfer learning
- autonomous learning
- continuous state
- user model
- reinforcement learning methods
- ray tracing
- action space
- markov decision process
- partially observable
- learning capabilities
- product recommendation
- function approximators
- policy evaluation
- actor critic
- exploration exploitation
- key concepts
- direct policy search