Intermittent Communications in Decentralized Shadow Reward Actor-Critic.
Amrit Singh BediAlec KoppelMengdi WangJunyu ZhangPublished in: CDC (2021)
Keyphrases
- actor critic
- policy gradient
- reinforcement learning
- average reward
- multi agent
- reinforcement learning algorithms
- optimal control
- approximate dynamic programming
- markov decision processes
- gradient method
- function approximation
- temporal difference
- cooperative
- single agent
- policy iteration
- long run
- approximation methods
- optimal policy
- model free
- reward function
- neuro fuzzy
- stochastic games
- state space
- machine learning
- variance reduction
- control strategy
- partially observable markov decision processes
- dynamical systems
- optimal solution
- decision making
- learning algorithm