Login / Signup
Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem.
Wesley Cowan
Michael N. Katehakis
Published in:
CoRR (2015)
Keyphrases
</>
finite horizon
asymptotic optimality
regret bounds
asymptotically optimal
optimal policy
infinite horizon
markov decision processes
multi armed bandit
lower bound
multistage
machine learning
mathematical model
markov decision process