Distributed Model-Free Policy Iteration for Networks of Homogeneous Systems.
Shahriar TalebiSiavash AlemzadehMehran MesbahiPublished in: CDC (2021)
Keyphrases
- model free
- policy iteration
- reinforcement learning
- markov decision processes
- sample path
- function approximation
- least squares
- temporal difference
- reinforcement learning algorithms
- policy evaluation
- average reward
- optimal policy
- feature extraction
- fixed point
- machine learning
- state space
- finite state
- temporal difference learning
- learning algorithm