Bellman Residual Orthogonalization for Offline Reinforcement Learning.
Andrea ZanetteMartin J. WainwrightPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- policy iteration
- policy evaluation
- model free
- markov decision processes
- function approximation
- sample path
- least squares
- temporal difference
- optimal policy
- multi agent
- reinforcement learning algorithms
- state space
- fixed point
- markov decision process
- average reward
- hybrid algorithms
- monte carlo
- function approximators
- learning algorithm
- machine learning
- approximation methods
- markov decision problems