Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies.
Bram De CoomanJohan A. K. SuykensAndreas OrtseifenPublished in: LOD (2) (2022)
Keyphrases
- optimal policy
- state dependent
- reinforcement learning
- discounted reward
- large deviations
- markov decision processes
- average reward
- initial state
- decision problems
- state space
- action space
- policy iteration
- markov decision process
- asymptotically optimal
- continuous state
- finite state
- finite horizon
- multistage
- dynamic programming
- long run
- action selection
- state action
- infinite horizon
- control policies
- reward shaping
- sufficient conditions
- fitted q iteration
- policy search
- reward function
- customer demand
- lower bound
- optimal production
- lost sales
- stationary distribution
- neural network
- average cost
- multi agent
- steady state
- inventory level