Δ-OPE: Off-Policy Estimation with Pairs of Policies.

Olivier Jeunen Aleksei Ustimenko

Published in: CoRR (2024)

Keyphrases

search algorithm
estimation algorithm
artificial intelligence
pairwise
optimal policy
learning algorithm
accurate estimation