A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments.
Yang YangJiang LiJinyong HouYe WangHuadong ZhaoPublished in: Sensors (2023)
Keyphrases
- complex environments
- k means
- multi agent
- dynamic programming
- gradient ascent
- worst case
- search space
- optimal solution
- np hard
- objective function
- convergence rate
- learning algorithm
- single agent
- control system
- reinforcement learning
- simulated annealing
- markov chain
- support vector machine svm
- mathematical model
- policy gradient
- neural network