Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition.
Long-Fei LiPeng ZhaoZhi-Hua ZhouPublished in: CoRR (2024)
Keyphrases
- improved algorithm
- discriminant function
- markov decision processes
- optimization strategy
- reinforcement learning
- state space
- grey prediction model
- original version
- closed form
- finite horizon
- relevance feedback
- dynamic programming
- multi agent
- optimal policy
- markov decision problems
- artificial neural networks
- factored mdps