Thompson Sampling in Non-Episodic Restless Bandits.
Young Hun JungMarc AbeilleAmbuj TewariPublished in: CoRR (2019)
Keyphrases
- semi markov
- multi armed bandit
- optimal control
- random sampling
- real time
- stochastic systems
- sampling strategies
- sampling strategy
- monte carlo
- multi armed bandits
- sampling methods
- sampling algorithm
- sample size
- dynamic programming
- evolutionary algorithm
- training set
- reinforcement learning
- decision trees
- computer vision