Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model.
Jialian LiTongzheng RenDong YanHang SuJun ZhuPublished in: AAAI (2022)
Keyphrases
- generative model
- markov decision process
- reinforcement learning
- prior knowledge
- discriminative learning
- optimal policy
- state space
- probabilistic model
- learning process
- learned models
- learning algorithm
- learning tasks
- active learning
- em algorithm
- action space
- infinite horizon
- policy iteration
- training data
- finite horizon
- dirichlet process mixture models
- reward function
- hidden variables
- topic models
- particle filter