Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning.
Ruida ZhouTao LiuDileep KalathilP. R. KumarChao TianPublished in: NeurIPS (2022)
Keyphrases
- policy gradient
- multi objective
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- objective function
- evolutionary algorithm
- optimization algorithm
- optimal control
- model free reinforcement learning
- policy gradient methods
- single agent
- markov decision processes
- genetic algorithm
- average reward
- variance reduction
- temporal difference
- function approximators
- particle swarm optimization
- gradient method
- neural network
- reinforcement learning methods
- approximation methods
- least squares
- supervised learning
- approximate dynamic programming
- machine learning
- search space
- long run
- evaluation function
- multi agent
- learning algorithm
- learning tasks
- optimal policy