Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning.
Sumeet BatraBryon TjanakaMatthew Christopher FontaineAleksei PetrenkoStefanos NikolaidisGaurav S. SukhatmePublished in: ICLR (2024)
Keyphrases
- policy gradient
- reinforcement learning
- actor critic
- function approximation
- reinforcement learning algorithms
- policy search
- optimal control
- state space
- model free reinforcement learning
- temporal difference
- variance reduction
- function approximators
- policy gradient methods
- approximate dynamic programming
- temporal difference learning
- average reward
- control problems
- real valued
- optimal policy
- dynamic programming