Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback.
Canzhe ZhaoRuofeng YangBaoxiang WangXuezhou ZhangShuai LiPublished in: NeurIPS (2023)
Keyphrases
- markov decision processes
- low rank
- reinforcement learning
- optimal policy
- learning process
- state space
- learning tasks
- learning algorithm
- dynamic programming
- multistage
- model based reinforcement learning
- decision theoretic planning
- policy iteration
- partially observable
- learning problems
- linear combination
- data points
- feature extraction