Constraints Penalized Q-learning for Safe Offline Reinforcement Learning.
Haoran XuXianyuan ZhanXiangyu ZhuPublished in: AAAI (2022)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- state space
- multi agent
- model free
- action selection
- least squares
- optimal policy
- continuous state and action spaces
- reinforcement learning methods
- temporal difference learning
- learning algorithm
- loss function
- temporal difference
- cooperative
- relational reinforcement learning
- monte carlo
- maximum likelihood
- control problems
- constraint satisfaction
- multi agent reinforcement learning
- stochastic approximation
- markov decision process
- neural network
- multiagent learning
- state action
- transfer learning
- constraint programming