Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition.
Canzhe ZhaoRuofeng YangBaoxiang WangShuai LiPublished in: ICLR (2023)
Keyphrases
- markov decision processes
- reinforcement learning
- model based reinforcement learning
- dynamic programming
- transition matrices
- partially observable
- optimal policy
- state space
- learning algorithm
- learning tasks
- finite state
- finite horizon
- decision processes
- stochastic games
- markov chain
- markov decision process
- partially observed
- state abstraction