Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation.
Yunheng LiZhongyu LiQuansheng ZengQibin HouMing-Ming ChengPublished in: CoRR (2024)
Keyphrases
- semantic segmentation
- street scenes
- object categories
- conditional random fields
- superpixels
- label transfer
- weakly supervised
- computer vision
- scene classification
- pascal voc
- natural language
- image processing
- object classes
- vision system
- face detection
- object recognition
- generative model
- image understanding
- image classification
- bounding box
- object detection
- object class
- small number
- higher order
- semi supervised
- information extraction
- hidden markov models
- multiscale