A Bandit Learning Method for Continuous Games Under Feedback Delays with Residual Pseudo-Gradient Estimate.
Yuanhanqing HuangJianghai HuPublished in: CDC (2023)
Keyphrases
- high accuracy
- prior knowledge
- detection method
- learning scheme
- learning algorithm
- unsupervised learning
- pairwise
- active learning
- cost function
- edge detection
- online learning
- clustering method
- computational complexity
- significant improvement
- dynamic programming
- learning process
- model selection
- decision problems
- convergence rate
- objective function