Smoothing policies and safe policy gradients.
Matteo PapiniMatteo PirottaMarcello RestelliPublished in: Mach. Learn. (2022)
Keyphrases
- optimal policy
- control policies
- access control policies
- management policies
- markov decision process
- transport systems
- policy search
- allocation policy
- reinforcement learning
- revenue management
- markov decision processes
- conflict resolution
- state space
- allocation policies
- markov decision problems
- policy gradient methods
- decision processes
- partially observable markov decision processes
- finite horizon
- dynamic programming
- decision problems
- long run
- control policy
- state dependent
- finite state
- reward function
- smoothing methods
- access control
- asymptotically optimal
- infinite horizon
- scheduling policies
- expected reward
- holding cost
- security policies
- approximate policy iteration
- privacy policies
- smoothing algorithm
- optimal pricing
- expected cost
- optimal production
- average reward
- selective perception