Interpreting CLIP's Image Representation via Text-Based Decomposition.
Yossi GandelsmanAlexei A. EfrosJacob SteinhardtPublished in: ICLR (2024)
Keyphrases
- image representation
- low level features
- image classification
- multiscale
- object recognition
- visual content
- bag of words
- image content
- image features
- visual words
- representation scheme
- sparse coding
- image retrieval
- feature space
- quadtree
- image classification and retrieval
- receptive fields
- scene classification
- scene recognition
- visual features
- sparse representation
- image search
- feature representations
- image processing
- semantic information
- pattern representation
- bag of visual words
- compressive sensing
- face recognition
- keywords
- wavelet packet
- multimedia
- multiresolution
- scene categories
- vision system
- video clips