Offline Reinforcement Learning with Realizability and Single-policy Concentrability.
Wenhao ZhanBaihe HuangAudrey HuangNan JiangJason D. LeePublished in: COLT (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- state and action spaces
- markov decision processes
- partially observable
- partially observable environments
- approximate dynamic programming
- policy gradient
- neural network
- reward function
- action selection
- markov decision problems
- policy evaluation
- function approximation
- reinforcement learning problems
- dynamical systems
- multi agent
- partially observable domains