A fixed-point policy-iteration-type algorithm for symmetric nonzero-sum stochastic impulse games.
Diego ZabaljaureguiPublished in: CoRR (2019)
Keyphrases
- fixed point
- policy iteration
- bellman residual
- approximate value iteration
- learning algorithm
- sample path
- objective function
- model free
- optimal solution
- sufficient conditions
- optimal policy
- markov decision processes
- convergence rate
- temporal difference learning
- dynamic programming
- linear programming
- monte carlo
- least squares
- probabilistic model
- markov decision problems
- cost function