Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration.
Runzhe WuYufeng ZhangZhuoran YangZhaoran WangPublished in: NeurIPS (2021)
Keyphrases
- multi objective
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- markov decision process
- evolutionary algorithm
- policy iteration
- multi objective optimization
- optimization algorithm
- reinforcement learning algorithms
- dynamic programming
- objective function
- multiple objectives
- partially observable markov decision processes
- multiobjective optimization
- heuristic search
- pareto optimal
- particle swarm optimization
- average reward
- conflicting objectives
- function approximation
- multi objective optimization problems
- genetic algorithm
- nsga ii
- real time
- temporal difference
- partially observable
- bi objective
- finite state
- learning algorithm
- optimization problems
- markov chain
- linear program
- belief state
- reward function
- multi agent
- decision making
- model free
- belief space
- long run
- markov decision chains
- partially observable markov