Login / Signup

Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models.

Xuanyu LeiZonghan YangXinrui ChenPeng LiYang Liu
Published in: CoRR (2024)
Keyphrases
  • multi modal
  • cross modal
  • vision system
  • multi modality
  • computer vision
  • audio visual
  • semantic concepts
  • image processing
  • multimedia
  • feature extraction
  • image annotation
  • humanoid robot
  • video search