Some algorithms for correlated bandits with non-stationary rewards: Regret bounds and applications.

Prathamesh Mayekar Nandyala Hemachandra

Published in: CODS (2016)

Keyphrases

non stationary
adaptive algorithms
regret bounds
multi armed bandit
learning algorithm
multi armed bandits
computational complexity
reinforcement learning
worst case
mutual information
markov decision processes
bandit problems