Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP.

Sriram Balasubramanian Samyadeep Basu Soheil Feizi

Published in: CoRR (2024)

Keyphrases

image representation
low level features
image classification
multiscale
bag of words
image content
image features
object recognition
feature representations
visual words
quadtree
bag of features
representation scheme
feature space
sparse representation
sparse coding
text mining
image retrieval
scene classification
spatial pyramid
image classification and retrieval
gaussian mixture modeling
visual content
high level
cbir systems
region segmentation
video clips
hierarchical structure
image processing