A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs.

Junyue Zhang Yifen Mu

Published in: CoRR (2024)

Keyphrases

long run
repeated games
stochastic games
average reward
gradient method
average cost
infinite horizon
optimal policy
short run
markov decision processes
convergence rate
subgame perfect equilibrium
control policy
finite horizon
step size
policy iteration
nash equilibrium
negative matrix factorization
genetic algorithm
optimization methods
nash equilibria
game theory
markov decision process
convergence speed
incomplete information
dynamical systems
sufficient conditions
state space
dynamic programming