Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models.

Published in: CoRR (2024)

Keyphrases