Bellman Residual Orthogonalization for Offline Reinforcement Learning.

Andrea Zanette Martin J. Wainwright

Published in: CoRR (2022)

Keyphrases

reinforcement learning
policy iteration
policy evaluation
markov decision processes
model free
sample path
function approximation
temporal difference
least squares
optimal policy
reinforcement learning algorithms
machine learning
fixed point
learning algorithm
neural network
transfer learning
supervised learning
state space
optimal solution
multi agent
genetic algorithm