Login / Signup
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models.
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
Published in:
CoRR (2024)
Keyphrases
</>
multi modal
cross modal
vision system
multi modality
computer vision
audio visual
semantic concepts
image processing
multimedia
feature extraction
image annotation
humanoid robot
video search