Login / Signup

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline.

Wenjia MengQian ZhengLong YangYilong YinGang Pan
Published in: CoRR (2024)
Keyphrases