Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization.
Fan YangWenxuan ZhouZuxin LiuDing ZhaoDavid HeldPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- markov decision processes
- optimal policy
- state space
- markov decision process
- global optimization
- optimization process
- action space
- optimization problems
- optimization algorithm
- machine learning
- average reward
- function approximation
- embedded systems
- multi agent
- state and action spaces
- optimal control
- dynamic programming
- finite state
- model free
- temporal difference
- reward function
- state abstraction
- learning algorithm
- state action
- bayesian reinforcement learning
- learning problems
- policy search
- planning under uncertainty
- reinforcement learning methods
- neural network
- policy iteration
- particle swarm optimization
- dynamical systems
- decision problems