Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback.
Haolin LiuChen-Yu WeiJulian ZimmertPublished in: CoRR (2023)
Keyphrases
- regret bounds
- dynamic programming
- lower bound
- markov decision processes
- worst case
- online learning
- linear regression
- optimal linear
- upper confidence bound
- multi armed bandit
- expert advice
- semi infinite programming
- bandit problems
- reinforcement learning
- closed form
- piecewise linear
- reward function
- bregman divergences
- average reward
- random sampling
- grassmann manifold
- relevance feedback
- upper bound
- optimal solution