Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs.
Harsh SatijaPhilip S. ThomasJoelle PineauRomain LarochePublished in: NeurIPS (2021)
Keyphrases
- multi objective
- state and action spaces
- optimal policy
- markov decision processes
- markov decision process
- reinforcement learning
- policy search
- markov decision problems
- state space
- optimization algorithm
- multi objective optimization
- finite horizon
- average cost
- real time
- evolutionary algorithm
- genetic algorithm
- average reward
- partially observable
- policy iteration
- multiple objectives
- finite number
- constrained optimization
- particle swarm optimization
- reward function
- action space
- nsga ii
- queueing networks
- decision processes
- decision problems
- differential evolution
- stationary policies
- dynamic programming
- reinforcement learning problems
- neural network