Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond.
Xutong LiuSiwei WangJinhang ZuoHan ZhongXuchuang WangZhiyong WangShuai LiMohammad HajiesmailiJohn C. S. LuiWei ChenPublished in: CoRR (2024)
Keyphrases
- multi armed bandits
- reinforcement learning
- multi armed bandit
- bandit problems
- function approximation
- model free
- markov decision processes
- reinforcement learning algorithms
- state space
- temporal difference
- optimal policy
- learning process
- monte carlo
- machine learning
- supervised learning
- optimal control
- action selection
- special case
- action space
- lower bound
- learning algorithm