Distributed Consensus-Based Multi-Agent Off-Policy Temporal-Difference Learning.
Milos S. StankovicMarko BekoSrdjan S. StankovicPublished in: CDC (2021)
Keyphrases
- multi agent
- temporal difference learning
- reinforcement learning
- function approximation
- fixed point
- multi agent systems
- game playing
- temporal difference
- evaluation function
- approximate value iteration
- reinforcement learning algorithms
- least squares
- markov decision process
- state space
- machine learning
- model free
- policy iteration
- learning environment