Login / Signup
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs.
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
Published in:
CoRR (2023)
Keyphrases
</>
multi modal
cross modal
video search
single modality
multi modality
visual information
high dimensional
auto annotation
low level
semantic concepts
audio visual
visual cues
image annotation
feature selection
low contrast
humanoid robot
visual features
image classification
fusing multiple
uni modal
object recognition