Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization.
Yufei KuangMiao LuJie WangQi ZhouBin LiHouqiang LiPublished in: AAAI (2022)
Keyphrases
- learning algorithm
- learning systems
- state dependent
- action selection
- dynamic model
- learning process
- state action
- knowledge acquisition
- partially observable environments
- genetic algorithm
- recurrent networks
- partially observable
- inductive inference
- global optimization
- mobile learning
- optimal policy
- unsupervised learning
- online learning
- collaborative learning
- optimization problems
- supervised learning
- state space
- hidden markov models
- prior knowledge
- reinforcement learning