Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs.
Harsh SatijaPhilip S. ThomasJoelle PineauRomain LarochePublished in: CoRR (2021)
Keyphrases
- multi objective
- state and action spaces
- markov decision processes
- optimal policy
- multi objective optimization
- reinforcement learning
- markov decision process
- markov decision problems
- optimization algorithm
- evolutionary algorithm
- policy search
- state space
- finite number
- genetic algorithm
- finite horizon
- partially observable
- trade off
- real time
- average reward
- infinite horizon
- average cost
- particle swarm optimization
- nsga ii
- decision processes
- objective function
- policy iteration
- reward function
- action space
- action selection
- long run
- conflicting objectives
- knapsack problem
- neural network