Bridging the gap between QP-based and MPC-based RL.
Shambhuraj SawantSebastien GrosPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- quadratic programming
- maximum margin
- optimal control
- linear programming
- closed loop
- dynamic model
- decomposition algorithm
- model free
- rate control
- reinforcement learning algorithms
- markov decision processes
- partially observable domains
- function approximation
- optimal policy
- support vector machine
- state space
- multi agent
- learning algorithm
- visual quality
- temporal difference
- markov decision process
- dynamical systems
- video sequences
- optimal solution
- action space
- learning agents
- rl algorithms
- machine learning