Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning.
Arpan KusariJonathan P. HowPublished in: ICRA (2020)
Keyphrases
- reinforcement learning
- multi objective
- reward function
- policy search
- markov decision processes
- dynamic programming
- reinforcement learning algorithms
- optimal policy
- initially unknown
- state space
- multi objective optimization
- pareto optimal
- evolutionary algorithm
- inverse reinforcement learning
- optimal control
- markov decision process
- function approximation
- control policy
- objective function
- transition model
- model free
- genetic algorithm
- continuous state
- control policies
- multi agent systems
- temporal difference
- nsga ii
- optimal solution
- average reward
- higher order
- transition probabilities
- multi criteria
- generative model
- sufficient conditions