Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond.
Daniel JiangHaipeng LuoChu WangYingfei WangPublished in: KDD (2021)
Keyphrases
- multi armed bandits
- reinforcement learning
- decision making
- multi armed bandit
- action selection
- bandit problems
- function approximation
- decision makers
- electronic commerce
- reinforcement learning algorithms
- markov decision processes
- temporal difference
- data mining
- state space
- learning algorithm
- machine learning
- decision process
- model free
- dynamic programming
- utility function
- markov decision process
- active learning