Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy.
Boyi LiuQi CaiZhuoran YangZhaoran WangPublished in: CoRR (2019)
Keyphrases
- optimal policy
- trust region
- optimization methods
- markov decision processes
- line search
- dynamic programming
- state space
- state dependent
- finite horizon
- reinforcement learning
- long run
- infinite horizon
- optimization problems
- global optimum
- sufficient conditions
- average reward
- markov decision process
- neural network
- optimization method
- policy iteration
- lost sales
- global convergence
- optimization algorithm
- simulated annealing
- levenberg marquardt
- markov decision problems
- column generation
- constrained optimization
- objective function
- convergence rate
- machine learning
- reward function
- combinatorial optimization