Login / Signup
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems.
Marc Abeille
Alessandro Lazaric
Published in:
ICML (2018)
Keyphrases
</>
control problems
optimal control
linear quadratic
reinforcement learning
multi armed bandit
dynamic programming
control strategy
control law
regret bounds
multiscale
objective function
special case
graphical models
closed form
closed loop
latent variables