Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning.
Simon SchrodiDavid T. HoffmannMax ArgusVolker FischerThomas BroxPublished in: CoRR (2024)
Keyphrases
- computer vision
- learning process
- prior knowledge
- object recognition
- relative position
- multi modal
- information processing
- d objects
- visual representation
- machine learning
- learning systems
- structured representation
- representation language
- learning algorithm
- complex objects
- context dependent
- background knowledge
- object oriented programming
- vision system
- active learning
- multiple representations
- computer based instruction