Environment Upgrade Reinforcement Learning for Non-Differentiable Multi-Stage Pipelines.
Shuqin XieZitian ChenChao XuCewu LuPublished in: CVPR (2018)
Keyphrases
- multistage
- reinforcement learning
- dynamic programming
- optimal policy
- single stage
- production system
- lot sizing
- stochastic optimization
- stochastic programming
- mobile robot
- finite horizon
- markov decision processes
- function approximation
- dynamic environments
- objective function
- machine learning
- loss function
- linear programming
- learning agent
- multi agent
- attack detection
- lot streaming