Convergent Distributed Actor-Critic Algorithm Based on Gradient Temporal Difference.
Milos S. StankovicMarko BekoSrdjan S. StankovicPublished in: EUSIPCO (2022)
Keyphrases
- actor critic
- temporal difference
- monte carlo
- optimization algorithm
- learning algorithm
- reinforcement learning
- td learning
- policy gradient
- dynamic programming
- gradient method
- search space
- objective function
- particle swarm optimization
- simulated annealing
- optimal control
- cost function
- decision problems
- neuro fuzzy
- model free
- reinforcement learning algorithms
- policy iteration
- evolutionary algorithm