Neural Temporal Difference and Q Learning Provably Converge to Global Optima.
Qi CaiZhuoran YangJason D. LeeZhaoran WangPublished in: Math. Oper. Res. (2024)
Keyphrases
- global optima
- temporal difference
- td learning
- function approximation
- reinforcement learning
- reinforcement learning algorithms
- model free
- action selection
- global optimization
- temporal difference learning
- optimization problems
- optimization algorithm
- temporal difference methods
- global optimum
- evaluation function
- policy iteration
- step size
- neural network
- global search
- premature convergence
- monte carlo
- control parameters
- function approximators
- actor critic
- state space
- policy evaluation
- radial basis function
- supervised learning
- pso algorithm
- reinforcement learning methods
- search algorithm
- reinforcement learning problems
- particle swarm
- convergence rate
- particle swarm optimization algorithm
- learning tasks
- learning rate
- machine learning
- genetic algorithm
- particle swarm optimization
- objective function
- feature space
- multi objective
- differential evolution
- cost function
- markov chain