Interpreting CLIP's Image Representation via Text-Based Decomposition.
Yossi GandelsmanAlexei A. EfrosJacob SteinhardtPublished in: CoRR (2023)
Keyphrases
- image representation
- low level features
- image classification
- image content
- multiscale
- bag of words
- object recognition
- visual content
- image retrieval
- feature representations
- visual features
- bag of visual words
- visual words
- quadtree
- image search
- video clips
- representation scheme
- feature space
- image features
- scene classification
- scene recognition
- multimedia
- sparse representation
- pattern representation
- sparse coding
- wavelet packet
- receptive fields
- object detection
- co occurrence
- image database
- bag of features
- query processing
- multiresolution
- region segmentation
- feature vectors
- high level
- computer vision
- spatial pyramid
- action recognition