Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization.
Fan YangWenxuan ZhouZuxin LiuDing ZhaoDavid HeldPublished in: ICRA (2024)
Keyphrases
- reinforcement learning
- markov decision processes
- optimal policy
- markov decision process
- state space
- reinforcement learning algorithms
- optimization problems
- action space
- optimization algorithm
- state and action spaces
- global optimization
- function approximation
- model free
- dynamic programming
- multi agent
- markov decision problems
- temporal difference
- trajectory data
- partially observable
- learning algorithm
- reward function
- embedded systems
- learning problems
- optimization method
- policy iteration
- search algorithm
- planning under uncertainty
- objective function
- neural network
- action sets