Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback.
Haolin LiuChen-Yu WeiJulian ZimmertPublished in: ICLR (2024)
Keyphrases
- regret bounds
- multi armed bandit
- online learning
- reinforcement learning
- worst case
- markov decision processes
- lower bound
- upper confidence bound
- dynamic programming
- optimal linear
- bandit problems
- closed form
- optimal control
- multi agent
- semi infinite programming
- optimal solution
- finite horizon
- average cost
- piecewise linear
- relevance feedback
- grassmann manifold
- upper bound
- state space
- multi armed bandit problems