Login / Signup
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization.
Fang Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
quadratic programming
probabilistic model
distance measure
online learning
markov decision processes