Login / Signup
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models.
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Published in:
CoRR (2024)
Keyphrases
</>
multi modal
multi step
high dimensional
machine learning
video search
cross modal
multi modality
audio visual
semantic concepts
uni modal
support vector
image classification
fusing multiple