Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems.
Tianchi CaiShenliao BaoJiyan JiangShiji ZhouWenpeng ZhangLihong GuJinjie GuGuannan ZhangPublished in: CoRR (2023)
Keyphrases
- model free reinforcement learning
- recommender systems
- reinforcement learning
- policy gradient
- collaborative filtering
- function approximation
- state space
- optimal control
- user profiles
- reinforcement learning algorithms
- matrix factorization
- gradient method
- average reward
- multi agent
- partially observable markov decision processes
- long run
- optimal policy
- dynamic programming
- computational complexity
- markov decision processes
- transfer learning
- markov chain
- learning algorithm