Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP.
Sriram BalasubramanianSamyadeep BasuSoheil FeiziPublished in: CoRR (2024)
Keyphrases
- image representation
- low level features
- image classification
- multiscale
- bag of words
- image content
- image features
- object recognition
- feature representations
- visual words
- quadtree
- bag of features
- representation scheme
- feature space
- sparse representation
- sparse coding
- text mining
- image retrieval
- scene classification
- spatial pyramid
- image classification and retrieval
- gaussian mixture modeling
- visual content
- high level
- cbir systems
- region segmentation
- video clips
- hierarchical structure
- image processing