Bellman Residual Orthogonalization for Offline Reinforcement Learning.
Andrea ZanetteMartin J. WainwrightPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- policy iteration
- policy evaluation
- markov decision processes
- model free
- sample path
- function approximation
- temporal difference
- least squares
- optimal policy
- reinforcement learning algorithms
- machine learning
- fixed point
- learning algorithm
- neural network
- transfer learning
- supervised learning
- state space
- optimal solution
- multi agent
- genetic algorithm