Login / Signup
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models.
Chunjiang Ge
Sijie Cheng
Ziming Wang
Jiale Yuan
Yuan Gao
Jun Song
Shiji Song
Gao Huang
Bo Zheng
Published in:
CoRR (2024)
Keyphrases
</>
data sets
prior knowledge
visual features
classification models
machine learning
low level
multi modal
statistical model
statistical models
visual information
bayesian framework
structural model
visual tasks