Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual Capabilities Without Richer Cross-Modal Projections.
Gaurav VermaMinje ChoiKartik SharmaJamelle Watson-DanielsSejoon OhSrijan KumarPublished in: CoRR (2024)
Keyphrases
- cross modal
- multi modal
- domain specific
- visual similarity
- three dimensional
- perceptual information
- multimedia retrieval
- visual data
- visual recognition
- visual information
- multimedia databases
- high level
- learning tasks
- metadata
- information retrieval
- image retrieval
- low level
- image database
- high dimensional
- data management
- visual features
- feature selection