Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games.
Mikoto KudoYohei AkimotoPublished in: CoRR (2024)
Keyphrases
- policy iteration
- optimal policy
- sample path
- markov decision processes
- nash equilibria
- game theory
- state dependent
- control policies
- nash equilibrium
- reinforcement learning
- average reward
- decision problems
- finite horizon
- dynamic programming
- finite state
- model free
- policy evaluation
- infinite horizon
- markov decision process
- fixed point
- state space
- long run
- multistage
- markov decision problems
- least squares
- sufficient conditions
- temporal difference
- initial state
- monte carlo
- discounted reward
- long run average cost
- average cost
- optimal control
- graphical models