Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization.
Yufei KuangMiao LuJie WangQi ZhouBin LiHouqiang LiPublished in: CoRR (2021)
Keyphrases
- learning algorithm
- learning process
- reinforcement learning
- learning systems
- online learning
- optimization algorithm
- machine learning
- kernel machines
- partially observable
- action selection
- optimal policy
- neural network
- global optimization
- inductive inference
- active learning
- prior knowledge
- inverse reinforcement learning
- partially observable environments
- policy gradient methods