Smoothing Policies and Safe Policy Gradients.
Matteo PapiniMatteo PirottaMarcello RestelliPublished in: CoRR (2019)
Keyphrases
- optimal policy
- control policies
- access control policies
- management policies
- transport systems
- policy search
- markov decision process
- revenue management
- control policy
- policy gradient methods
- allocation policies
- decision problems
- decision processes
- state space
- partially observable markov decision processes
- finite horizon
- privacy policies
- allocation policy
- markov decision processes
- markov decision problems
- scheduling policies
- optimal production
- reward function
- access control
- state dependent
- dynamic programming
- infinite horizon
- long run
- optimal pricing
- total reward
- conflict resolution
- reinforcement learning
- smoothing methods
- policy iteration
- expected reward
- natural actor critic
- average cost
- image gradient
- expected cost
- production process
- finite state
- lost sales
- holding cost
- average reward
- decision process
- sufficient conditions
- selective perception