Sign in

Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation.

Chang CheQunwei LinXinyu ZhaoJiaxin HuangLiqiang Yu
Published in: CoRR (2024)
Keyphrases