Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning.

Published in: CoRR (2024)

Keyphrases