Login / Signup
An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes.
Prokopis C. Prokopiou
Peter E. Caines
Aditya Mahajan
Published in:
CCECE (2015)
Keyphrases
</>
multi armed bandit
regret bounds
multi armed bandits
reinforcement learning
lower bound
closed form
bandit problems
parameter estimation
resource allocation
linear regression
decentralized decision making
data mining
learning algorithm
optimal solution
loss function