Sign in

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs.

Shiyu XuanQingpei GuoMing YangShiliang Zhang
Published in: CoRR (2023)
Keyphrases
  • multi modal
  • multi modality
  • high dimensional
  • audio visual
  • cross modal
  • image annotation
  • computer vision
  • video sequences
  • visual information
  • computer assisted
  • humanoid robot