Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback.

Yang Cai Haipeng Luo Chen-Yu Wei Weiqiang Zheng

Published in: NeurIPS (2023)

Keyphrases

learning algorithm
learning process
optimal solution
supervised learning
learning tasks
long run