Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost.
Zhong ZhengHaochen ZhangLingzhou XuePublished in: CoRR (2024)
Keyphrases
- communication cost
- worst case
- distributed data
- sensor networks
- regret bounds
- lower bound
- dynamic programming
- reinforcement learning
- communication overhead
- upper bound
- processing cost
- network size
- optimal solution
- reduce communication cost
- function approximation
- data distribution
- online learning
- state space
- multi agent
- learning algorithm
- feature extraction
- minimax regret
- neural network