VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation.
Alekh AgarwalYujia JinTong ZhangPublished in: COLT (2023)
Keyphrases
- function approximation
- model free
- reinforcement learning
- reinforcement learning algorithms
- temporal difference
- average reward
- dynamic programming
- radial basis function
- temporal difference learning
- learning tasks
- rl algorithms
- td learning
- policy iteration
- function approximators
- state space
- learning algorithm
- policy evaluation
- pattern recognition
- learning process
- reinforcement learning methods
- optimal control
- policy gradient
- learning problems
- temporal difference methods