Q-Probe: A Lightweight Approach to Reward Maximization for Language Models.

Published in: CoRR (2024)

Keyphrases