Login / Signup

Analysis of Measure-Valued Derivatives in a Reinforcement Learning Actor-Critic Framework.

Kim van den HoutenEmile van KriekenBernd Heidergott
Published in: WSC (2022)
Keyphrases
  • reinforcement learning
  • actor critic
  • function approximation
  • temporal difference
  • markov decision processes
  • multi agent
  • optimal control
  • reinforcement learning algorithms
  • temporal difference learning