Policy Gradient RL Algorithms as Directed Acyclic Graphs.
Juan Jose Garau LuisPublished in: CoRR (2020)
Keyphrases
- directed acyclic graph
- policy gradient
- rl algorithms
- reinforcement learning
- model free
- optimal control
- learning problems
- actor critic
- average reward
- learning algorithm
- temporal difference learning
- stochastic games
- random variables
- reinforcement learning algorithms
- function approximation
- directed graph
- adaptive control
- state space
- machine learning algorithms
- function approximators
- supervised learning
- dynamic programming
- learning tasks
- multi agent
- long run
- markov decision processes
- single agent
- machine learning
- neural network
- markov chain
- gradient method