CMA-CLIP: Cross-Modality Attention Clip for Text-Image Classification.
Jinmiao FuShaoyuan XuHuidong LiuYang LiuNing XieChien-Chih WangJia LiuYi SunBryan WangPublished in: ICIP (2022)
Keyphrases
- image classification
- video clips
- video segments
- information retrieval
- feature extraction
- visual features
- data sets
- database
- keywords
- low level features
- text retrieval
- semantic information
- visual words
- natural language generation
- visual attention
- textual data
- focus of attention
- text documents
- bag of words
- web documents
- video data
- text mining
- video sequences
- multiscale
- high level
- data mining