A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs.
Junyue ZhangYifen MuPublished in: CoRR (2024)
Keyphrases
- long run
- repeated games
- stochastic games
- average reward
- gradient method
- average cost
- infinite horizon
- optimal policy
- short run
- markov decision processes
- convergence rate
- subgame perfect equilibrium
- control policy
- finite horizon
- step size
- policy iteration
- nash equilibrium
- negative matrix factorization
- genetic algorithm
- optimization methods
- nash equilibria
- game theory
- markov decision process
- convergence speed
- incomplete information
- dynamical systems
- sufficient conditions
- state space
- dynamic programming