X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
Artemis PanagopoulouLe XueNing YuJunnan LiDongxu LiShafiq JotyRan XuSilvio SavareseCaiming XiongJuan Carlos NieblesPublished in: CoRR (2023)