Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning.
Haoran XuXianyuan ZhanXiangyu ZhuPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- model free
- temporal difference learning
- learning algorithm
- state space
- multi agent
- stochastic approximation
- optimal policy
- constraint satisfaction
- maximum likelihood
- real time
- temporal difference
- relational reinforcement learning
- control problems
- learning agent
- learning problems
- markov decision processes
- learning process
- eligibility traces
- least squares
- supervised learning
- multiagent learning
- machine learning
- function approximators
- dynamic programming
- reward function
- action selection
- loss function
- optimal control
- constraint programming
- temporal difference methods
- exploration strategy
- policy search
- continuous state and action spaces
- continuous state
- reinforcement learning methods
- partially observable
- linear programming
- cooperative
- neural network