Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon.
Matteo BaseiXin GuoAnran HuYufei ZhangPublished in: CoRR (2020)
Keyphrases
- optimal control
- linear quadratic
- reinforcement learning
- worst case
- dynamical systems
- regret bounds
- dynamic programming
- lower bound
- reward function
- state space
- loss function
- control strategy
- machine learning
- learning algorithm
- image sequences
- transfer learning
- markov decision processes
- closed loop
- supervised learning
- color images
- vector valued
- image processing