Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards.
Omar BesbesYonatan GurAssaf J. ZeeviPublished in: CoRR (2014)
Keyphrases
- non stationary
- bandit problems
- exploration exploitation
- reinforcement learning
- adaptive algorithms
- total reward
- autoregressive
- optimal solution
- markov decision processes
- empirical mode decomposition
- high level
- decision problems
- feature selection
- image sequences
- feature space
- special case
- wavelet transform
- machine learning