Offline Reinforcement Learning with Realizability and Single-policy Concentrability.

Wenhao Zhan Baihe Huang Audrey Huang Nan Jiang Jason D. Lee

Published in: COLT (2022)

Keyphrases

reinforcement learning
optimal policy
policy search
state and action spaces
markov decision processes
partially observable
partially observable environments
approximate dynamic programming
policy gradient
neural network
reward function
action selection
markov decision problems
policy evaluation
function approximation
reinforcement learning problems
dynamical systems
multi agent
partially observable domains