A Bandit Learning Method for Continuous Games Under Feedback Delays with Residual Pseudo-Gradient Estimate.

Yuanhanqing Huang Jianghai Hu

Published in: CDC (2023)

Keyphrases

high accuracy
prior knowledge
detection method
learning scheme
learning algorithm
unsupervised learning
pairwise
active learning
cost function
edge detection
online learning
clustering method
computational complexity
significant improvement
dynamic programming
learning process
model selection
decision problems
convergence rate
objective function