Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation.
Honghao WeiXin LiuLei YingPublished in: AISTATS (2022)
Keyphrases
- model free
- reinforcement learning
- dynamic programming
- reinforcement learning algorithms
- function approximation
- learning algorithm
- neural network
- policy iteration
- space complexity
- machine learning
- policy evaluation
- worst case
- temporal difference
- regret minimization
- rl algorithms
- convergence rate
- learning problems
- supervised learning
- least squares
- search space
- optimal solution