Login / Signup
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization.
Fang Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
Published in:
Trans. Mach. Learn. Res. (2024)
Keyphrases
</>
markov decision processes
nearest neighbor
e learning
optimal policy
quadratic programming