Login / Signup
Off-Policy Exploitability-Evaluation and Equilibrium-Learning in Two-Player Zero-Sum Markov Games.
Kenshi Abe
Yusuke Kaneko
Published in:
CoRR (2020)
Keyphrases
</>
learning algorithm
reinforcement learning
linear programming
nash equilibrium