Login / Signup
Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation.
Naoki Nishimura
Ken Kobayashi
Kazuhide Nakata
Published in:
CoRR (2024)
Keyphrases
</>
policy evaluation
temporal difference
least squares
reinforcement learning
function approximation
monte carlo
model free
policy iteration
matrix inversion
optimal policy
variance reduction
neural network
dynamic programming
dynamical systems
semi parametric