Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost.

Zhong Zheng Haochen Zhang Lingzhou Xue

Published in: CoRR (2024)

Keyphrases

communication cost
worst case
distributed data
sensor networks
regret bounds
lower bound
dynamic programming
reinforcement learning
communication overhead
upper bound
processing cost
network size
optimal solution
reduce communication cost
function approximation
data distribution
online learning
state space
multi agent
learning algorithm
feature extraction
minimax regret
neural network