Login / Signup
Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning over a Finite-Time Horizon.
Matteo Basei
Xin Guo
Anran Hu
Yufei Zhang
Published in:
J. Mach. Learn. Res. (2022)
Keyphrases
</>
optimal control
linear quadratic
reinforcement learning
worst case
regret bounds
dynamic programming
dynamical systems
state space
control strategy
lower bound
reward function
optimal policy
transfer learning
loss function
machine learning
pairwise
learning algorithm
model selection
image analysis
gaussian model