Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability.
Hanlin ZhuAmy ZhangPublished in: CoRR (2023)
Keyphrases
- function approximation
- reinforcement learning
- function approximators
- temporal difference
- optimal policy
- temporal difference learning algorithms
- reinforcement learning algorithms
- temporal difference learning
- mountain car
- actor critic
- reinforcement learning problems
- policy gradient
- radial basis function
- learning tasks
- model free
- policy search
- markov decision processes
- temporal difference methods
- policy evaluation
- markov decision process
- dynamic programming
- approximate dynamic programming
- td methods
- action selection
- natural actor critic
- exploration exploitation tradeoff
- markov decision problems
- neural network
- action space
- infinite horizon