UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution.
Gengrui ZhangYao WangXiaoshuang ChenHongyi QianKaiqiao ZhanBen WangPublished in: AAAI (2024)
Keyphrases
- multistage
- recommender systems
- long term
- reinforcement learning
- optimal policy
- markov decision processes
- dynamic programming
- short term
- collaborative filtering
- single stage
- production system
- stochastic optimization
- multi agent
- reward function
- stochastic programming
- function approximation
- reinforcement learning algorithms
- state space
- lot sizing
- attack detection
- bi directional
- model free
- transfer learning
- control policy
- user preferences
- average cost
- decision problems
- action space
- production line
- learning algorithm
- machine learning