VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation.

Alekh Agarwal Yujia Jin Tong Zhang

Published in: CoRR (2022)

Keyphrases

model free
function approximation
reinforcement learning
reinforcement learning algorithms
temporal difference
average reward
radial basis function
rl algorithms
learning tasks
policy iteration
state space
temporal difference learning
dynamic programming
temporal difference methods
neural network
function approximators
learning process
td learning
policy evaluation
control policy
least squares
reward function
optimal policy
collaborative filtering