Bi-personal stochastic transient Markov games with stopping times and total reward criterion.
Victor Manuel Martínez-CortésPublished in: Kybernetika (2021)
Keyphrases
- total reward
- reinforcement learning algorithms
- reinforcement learning
- markov decision processes
- multiagent reinforcement learning
- state space
- optimality criterion
- steady state
- model free
- monte carlo
- learning algorithm
- optimal policy
- temporal difference
- function approximation
- action selection
- average reward
- stochastic games
- dynamic environments
- machine learning
- reward function
- dynamic programming