Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning.
Junseo LeeJaeseok HeoDohyeong KimGunmin LeeSonghwai OhPublished in: IROS (2023)
Keyphrases
- actor critic
- reinforcement learning
- temporal difference
- policy gradient
- approximate dynamic programming
- optimal control
- neuro fuzzy
- reinforcement learning algorithms
- function approximation
- policy iteration
- gradient method
- learning algorithm
- dynamic programming
- rl algorithms
- markov decision processes
- state space
- adaptive control
- control problems
- action selection
- machine learning
- learning capabilities
- average reward
- step size
- temporal difference learning
- linear program
- transfer learning
- sufficient conditions
- supervised learning
- natural actor critic