Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards.

Aadirupa Saha Pierre Gaillard Michal Valko

Published in: CoRR (2020)

Keyphrases

multi armed bandits
stochastic systems
stochastic optimization
multi agent
bandit problems
reward function
markov decision processes
reinforcement learning
stochastic model
monte carlo
multi armed bandit
goal directed
stochastic processes
decision theoretic
reasoning about actions
free riding
multiarmed bandit