Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image Classification.
Zhuohua LiRuyun WangFuqing ZhuJizhong HanSonglin HuPublished in: ICMR (2024)
Keyphrases
- cross modal
- multi label
- image classification
- visual recognition
- multi modal
- visual features
- image representation
- multi label classification
- visual similarity
- image annotation
- bag of words
- binary classification
- image features
- visual data
- image search
- feature extraction
- visual information
- visual words
- text categorization
- learning tasks
- image retrieval
- semantic concepts
- automatic image annotation
- data sets
- multi class
- multimedia databases
- graph cuts
- low level features
- learning algorithm
- data analysis
- web images
- machine learning
- visual content
- natural language processing
- active learning
- class labels
- low level
- relevance feedback