Constraints Penalized Q-learning for Safe Offline Reinforcement Learning.

Haoran Xu Xianyuan Zhan Xiangyu Zhu

Published in: AAAI (2022)

Keyphrases

reinforcement learning
function approximation
reinforcement learning algorithms
state space
multi agent
model free
action selection
least squares
optimal policy
continuous state and action spaces
reinforcement learning methods
temporal difference learning
learning algorithm
loss function
temporal difference
cooperative
relational reinforcement learning
monte carlo
maximum likelihood
control problems
constraint satisfaction
multi agent reinforcement learning
stochastic approximation
markov decision process
neural network
multiagent learning
state action
transfer learning
constraint programming