Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards.
Varun KanadeH. Brendan McMahanBrent BryanPublished in: AISTATS (2009)
Keyphrases
- multi armed bandits
- stochastic systems
- bandit problems
- reinforcement learning
- multi armed bandit
- reward shaping
- markov decision processes
- stochastic optimization
- stochastic programming
- expected reward
- regret bounds
- stochastic processes
- monte carlo
- online learning
- fully observable
- stochastic nature
- domain specific
- spatio temporal
- expert systems
- video sequences
- credit assignment
- multi agent