Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards.

Varun Kanade H. Brendan McMahan Brent Bryan

Published in: AISTATS (2009)

Keyphrases

multi armed bandits
stochastic systems
bandit problems
reinforcement learning
multi armed bandit
reward shaping
markov decision processes
stochastic optimization
stochastic programming
expected reward
regret bounds
stochastic processes
monte carlo
online learning
fully observable
stochastic nature
domain specific
spatio temporal
expert systems
video sequences
credit assignment
multi agent