Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces.

Tatiana Tatarenko Maryam Kamgarpour

Published in: CoRR (2018)

Keyphrases

online learning
action space
bandit problems
online algorithms
upper confidence bound
state space
markov decision processes
reinforcement learning
regret bounds
least squares
action selection
skill learning
state and action spaces
continuous state
stochastic processes
linear regression
real valued
lower bound
cooperative