Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning.
Arpan KusariJonathan P. HowPublished in: CoRR (2019)
Keyphrases
- multi objective
- reinforcement learning
- reward function
- policy search
- markov decision processes
- evolutionary algorithm
- dynamic programming
- optimal policy
- initially unknown
- markov decision process
- pareto optimal
- optimal control
- multi objective optimization
- reinforcement learning algorithms
- state space
- genetic algorithm
- multi agent
- partially observable
- control policy
- multiple agents
- optimal solution
- transition model
- objective function
- inverse reinforcement learning
- function approximation
- prior knowledge
- search algorithm
- control policies
- transition probabilities
- average cost
- hidden markov models
- data mining
- average reward