Sign in

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.

Artemis PanagopoulouLe XueNing YuJunnan LiDongxu LiShafiq JotyRan XuSilvio SavareseCaiming XiongJuan Carlos Niebles
Published in: CoRR (2023)
Keyphrases
  • cross modal
  • perceptual information
  • object recognition
  • high dimensional
  • multi modal
  • information retrieval
  • web pages
  • high level
  • image sequences
  • video sequences
  • video analysis