Deep Reinforcement Learning Techniques For Solving Hybrid Flow Shop Scheduling Problems: Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C).
Abdulrahman NahhasAndrey KharitonovKlaus TurowskiPublished in: HICSS (2022)
Keyphrases
- actor critic
- scheduling problem
- flowshop
- reinforcement learning
- policy gradient
- temporal difference
- approximate dynamic programming
- reinforcement learning algorithms
- optimal control
- gradient method
- policy iteration
- single machine
- function approximation
- combinatorial optimization
- markov decision problems
- setup times
- np hard
- processing times
- neuro fuzzy
- optimal policy
- job shop
- tabu search
- markov decision processes
- policy gradient methods
- natural actor critic
- special case
- optimization problems
- model free
- average reward
- sequence dependent setup times
- job shop scheduling problem
- optimization methods
- action selection
- parallel machines
- optimization algorithm
- linear program
- dynamic programming
- function approximators
- rl algorithms
- machine learning
- dynamical systems
- approximation methods
- state action
- state space
- step size
- optimization method
- reward function