Login / Signup

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models.

Zejun LiRuipu LuoJiwen ZhangMinghui QiuZhongyu Wei
Published in: CoRR (2024)
Keyphrases
  • multi modal
  • multi step
  • high dimensional
  • machine learning
  • video search
  • cross modal
  • multi modality
  • audio visual
  • semantic concepts
  • uni modal
  • support vector
  • image classification
  • fusing multiple