Login / Signup

LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression.

Jieneng ChenLuoxin YeJu HeZhao-Yang WangDaniel KhashabiAlan L. Yuille
Published in: CoRR (2024)
Keyphrases
  • multi modal
  • image annotation
  • visual context
  • high dimensional
  • audio visual
  • probabilistic model
  • video search
  • semantic concepts
  • machine learning
  • image processing
  • similarity measure