An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning.
Wenjia MengQian ZhengYue ShiGang PanPublished in: IEEE Trans. Neural Networks Learn. Syst. (2022)
Keyphrases
- optimization method
- trust region
- global optimum
- optimization methods
- reinforcement learning
- optimization algorithm
- genetic algorithm
- simulated annealing
- global convergence
- particle swarm
- differential evolution
- evolutionary algorithm
- metaheuristic
- column generation
- function approximation
- nonlinear optimization
- state space
- optimization procedure
- function approximators
- nelder mead simplex
- lower bound
- newton method
- hybrid algorithm
- multi view