Publication: VLRM: Vision-Language Models act as Reward Models for Image Captioning.