Login / Signup
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression.
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan L. Yuille
Published in:
CoRR (2024)
Keyphrases
</>
multi modal
image annotation
visual context
high dimensional
audio visual
probabilistic model
video search
semantic concepts
machine learning
image processing
similarity measure