Stateful Offline Contextual Policy Evaluation and Learning.

Nathan Kallus Angela Zhou

Published in: AISTATS (2022)

Keyphrases

learning algorithm
reinforcement learning
learning process
learning tasks
dynamic programming
least squares
cost function
mixture model
domain independent
temporal difference