Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability.
Hanlin ZhuAmy ZhangPublished in: NeurIPS (2023)
Keyphrases
- function approximation
- reinforcement learning
- function approximators
- temporal difference
- reinforcement learning problems
- temporal difference learning
- optimal policy
- policy gradient
- actor critic
- model free
- temporal difference learning algorithms
- learning tasks
- radial basis function
- reinforcement learning algorithms
- policy evaluation
- temporal difference methods
- policy iteration
- policy search
- markov decision processes
- td learning
- mountain car
- state action
- approximate dynamic programming
- machine learning
- learning algorithm
- markov decision process
- training data
- markov decision problems
- average reward
- action space
- partially observable markov decision processes
- transfer learning
- supervised learning
- natural actor critic