MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment.

Anurag Das Xinting Hu Li Jiang Bernt Schiele

Published in: CoRR (2024)

Keyphrases

semantic segmentation
street scenes
superpixels
conditional random fields
label transfer
weakly supervised
scene classification
text mining
natural language
object classes
bounding box
object categories
information retrieval
image understanding
long range
pascal voc
object class
object recognition
keywords
semantic information
natural images
input image
feature vectors
image segmentation
computer vision
machine learning