Login / Signup
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition.
Long-Fei Li
Peng Zhao
Zhi-Hua Zhou
Published in:
AISTATS (2024)
Keyphrases
</>
improved algorithm
discriminant function
markov decision processes
reinforcement learning
random sampling
optimization strategy
relevance feedback
gaussian distribution
factored mdps
state space
optimal policy
global optimization