Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy.
Boyi LiuQi CaiZhuoran YangZhaoran WangPublished in: NeurIPS (2019)
Keyphrases
- optimal policy
- trust region
- optimization methods
- finite horizon
- markov decision processes
- state space
- line search
- infinite horizon
- reinforcement learning
- global optimum
- dynamic programming
- state dependent
- long run
- optimization method
- column generation
- optimization problems
- sufficient conditions
- neural network
- markov decision process
- global convergence
- optimization algorithm
- average reward
- newton method
- lost sales
- markov decision problems
- inventory level
- constrained optimization
- quadratic programming
- least squares
- risk minimization
- special case