Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm.
Ashish Kumar JayantShalabh BhatnagarPublished in: NeurIPS (2022)
Keyphrases
- optimization algorithm
- reinforcement learning
- optimal policy
- model free
- policy search
- multi objective
- markov decision process
- action selection
- optimization method
- policy iteration
- evolutionary multi objective
- markov decision processes
- particle swarm optimization pso
- function approximation
- reinforcement learning problems
- differential evolution
- function approximators
- partially observable environments
- partially observable
- control policy
- action space
- state action
- policy gradient
- actor critic
- state space
- hybrid optimization algorithm
- optimization strategy
- reinforcement learning algorithms
- global optima
- state and action spaces
- markov decision problems
- machine learning
- average reward
- particle swarm optimisation
- partially observable markov decision processes
- reward function
- dynamic programming
- evolutionary algorithm
- learning algorithm
- neural network
- artificial bee colony
- control parameters
- convergence speed
- inverse reinforcement learning
- multi objective optimization
- agent learns