Dynamic Spectrum Aggregation and Access Scheme Based on Multi-Agent Actor-Critic Reinforcement Learning.
Wenjiao DingWensheng ZhangDeqiang WangJian SunCheng-Xiang WangPublished in: WCSP (2021)
Keyphrases
- reinforcement learning
- actor critic
- multi agent
- temporal difference
- function approximation
- policy gradient
- reinforcement learning algorithms
- state space
- optimal control
- approximate dynamic programming
- markov decision processes
- gradient method
- policy iteration
- action selection
- model free
- dynamic environments
- single agent
- neuro fuzzy
- learning algorithm
- control problems
- temporal difference learning
- average reward
- policy gradient methods
- transfer learning
- dynamic programming
- partially observable markov decision processes
- adaptive control
- learning problems
- rl algorithms
- supervised learning
- search space