Clipped-Objective Policy Gradients for Pessimistic Policy Optimization.
Jared MarkowitzEdward W. StaleyPublished in: CoRR (2023)
Keyphrases
- optimal policy
- optimization algorithm
- neural network
- asymptotically optimal
- expected cost
- action selection
- global optimization
- reinforcement learning
- supply chain
- decision makers
- multi agent
- image sequences
- decision process
- information retrieval
- machine learning
- real time
- markov decision process
- state dependent
- direct search