Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments.
Christoper WulurCornelius WeberStefan WermterPublished in: IJCNN (2021)
Keyphrases
- reinforcement learning
- action selection
- optimal policy
- reward function
- partially observable
- partially observable environments
- reinforcement learning algorithms
- reinforcement learning problems
- policy gradient
- markov decision problems
- markov decision processes
- partially observable markov decision processes
- total reward
- eligibility traces
- function approximation
- policy search
- uncertain environments
- multi agent
- average reward
- control policies
- machine learning
- state space
- agent receives
- reward shaping
- inverse reinforcement learning
- control policy
- policy iteration
- markov decision process
- model free
- learning process