Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods.
Zhengpeng XieChangdong YuWeizheng QiaoPublished in: CoRR (2023)
Keyphrases
- optimization methods
- reinforcement learning
- optimal policy
- optimization method
- simulated annealing
- control policy
- optimization problems
- policy search
- direct optimization
- gradient method
- function approximation
- optimization approaches
- global convergence
- markov decision process
- unconstrained optimization
- policy gradient
- stochastic methods
- action selection
- trust region
- action space
- state space
- quasi newton
- actor critic
- continuous optimization
- policy evaluation
- markov decision processes
- machine learning
- bayesian network models
- partially observable environments
- genetic algorithm
- efficient optimization
- variance reduction
- partially observable
- inverse problems
- policy iteration
- dynamic programming
- markov decision problems
- portfolio optimization
- particle swarm
- state and action spaces
- evolutionary algorithm