SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification.
Fang PengXiaoshan YangChangsheng XuPublished in: CoRR (2022)
Keyphrases
- language model
- image classification
- visual features
- low level features
- language modeling
- visual information
- key frames
- semantic content
- n gram
- document retrieval
- speech recognition
- language modelling
- image annotation
- statistical language models
- probabilistic model
- bag of words
- smoothing methods
- test collection
- retrieval model
- computer vision
- feature extraction
- semantic information
- query expansion
- image features
- information retrieval
- image representation
- relevance model
- vector space model
- language models for information retrieval
- context sensitive
- visual content
- ad hoc information retrieval
- video shots
- video sequences
- low level
- query terms
- video clips
- news video
- multi label
- high level
- visual words
- pseudo relevance feedback
- keywords
- semantic similarity
- image retrieval
- spoken term detection