Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies.
Tsung-Yen YangJustinian RoscaKarthik NarasimhanPeter J. RamadgePublished in: ICML (2021)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- state space
- reward function
- total reward
- function approximation
- markov decision processes
- fitted q iteration
- reinforcement learning agents
- control policies
- control policy
- transfer learning
- reinforcement learning algorithms
- dynamic programming
- multi agent
- markov decision problems
- action space
- model free
- hierarchical reinforcement learning
- policy gradient methods
- reinforcement learning methods
- linear constraints
- temporal difference
- global constraints
- finite state
- decision problems
- cooperative multi agent systems