Distributed Model-Free Policy Iteration for Networks of Homogeneous Systems.

Shahriar Talebi Siavash Alemzadeh Mehran Mesbahi

Published in: CDC (2021)

Keyphrases

model free
policy iteration
reinforcement learning
markov decision processes
sample path
function approximation
least squares
temporal difference
reinforcement learning algorithms
policy evaluation
average reward
optimal policy
feature extraction
fixed point
machine learning
state space
finite state
temporal difference learning
learning algorithm