Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings.

Ming Yin Yu-Xiang Wang

Published in: NeurIPS (2021)

Keyphrases

reinforcement learning
model free
dynamic programming
function approximation
average reward
optimal control
state space
average reward reinforcement learning
control policy
markov decision processes
total reward
approximate dynamic programming
reinforcement learning algorithms
data sets
learning process
learning environment
eligibility traces
partially observable environments
initially unknown
reinforcement learning methods
optimal strategy
average cost
partially observable
reward function
learning classifier systems
transfer learning
optimal policy
supervised learning
machine learning