UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution.
Gengrui ZhangYao WangXiaoshuang ChenHongyi QianKaiqiao ZhanBen WangPublished in: CoRR (2024)
Keyphrases
- multistage
- recommender systems
- long term
- reinforcement learning
- optimal policy
- markov decision processes
- dynamic programming
- short term
- collaborative filtering
- stochastic programming
- single stage
- production system
- function approximation
- lot sizing
- stochastic optimization
- state space
- user preferences
- model free
- reinforcement learning algorithms
- reward function
- multi agent
- control policy
- attack detection
- decision problems
- markov decision process
- cold start problem
- assembly systems
- bi directional
- action space
- learning algorithm
- partially observable markov decision processes
- production line
- long run
- machine learning
- user profiles
- sufficient conditions
- decision making