Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret.
Lai WeiVaibhav SrivastavaPublished in: CoRR (2021)
Keyphrases
- non stationary
- minimax regret
- multi armed bandit
- stochastic programming
- preference elicitation
- fractional brownian motion
- utility function
- reinforcement learning
- reward function
- decision problems
- multistage
- linear program
- regret bounds
- optimal policy
- misclassification costs
- training and test data
- random fields
- active learning
- concept drift
- computational complexity
- bandit problems
- maximum entropy
- robust optimization
- feature selection
- incomplete information
- test data
- dynamic programming
- training data