Multimodal CLIP Inference for Meta-Few-Shot Image Classification.

Constance Ferragu Philomène Chagniot Vincent Coyette

Published in: CoRR (2024)

Keyphrases

image classification
visual features
key frames
image features
image representation
low level features
multi modal
video sequences
bag of words
meta level
video shots
video data
probabilistic inference
class specific
bayesian networks
feature extraction
visual words
bayesian inference
multimodal interaction
feature selection
shot boundary detection
video clips
sparse coding
video content
feature vectors
image segmentation