Login / Signup
Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework.
Kim van den Houten
Emile van Krieken
Bernd Heidergott
Published in:
WSC (2022)
Keyphrases
</>
reinforcement learning
actor critic
function approximation
temporal difference
markov decision processes
multi agent
optimal control
reinforcement learning algorithms
temporal difference learning