Bi-personal stochastic transient Markov games with stopping times and total reward criterion.

Victor Manuel Martínez-Cortés

Published in: Kybernetika (2021)

Keyphrases

total reward
reinforcement learning algorithms
reinforcement learning
markov decision processes
multiagent reinforcement learning
state space
optimality criterion
steady state
model free
monte carlo
learning algorithm
optimal policy
temporal difference
function approximation
action selection
average reward
stochastic games
dynamic environments
machine learning
reward function
dynamic programming