Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces.
Tatiana TatarenkoMaryam KamgarpourPublished in: CoRR (2018)
Keyphrases
- online learning
- action space
- bandit problems
- online algorithms
- upper confidence bound
- state space
- markov decision processes
- reinforcement learning
- regret bounds
- least squares
- action selection
- skill learning
- state and action spaces
- continuous state
- stochastic processes
- linear regression
- real valued
- lower bound
- cooperative