Cross-modal collaborative feature representation via Transformer-based multimodal mixers for RGB-T crowd counting.
Weihang KongJiayu LiuYao HongHe LiJienan ShenPublished in: Expert Syst. Appl. (2024)
Keyphrases
- cross modal
- feature representation
- multi modal
- multimedia retrieval
- feature extraction
- low dimensional
- multimedia databases
- feature set
- face recognition
- visual recognition
- sparse representation
- image retrieval
- visual data
- high dimensional
- visual similarity
- feature selection
- feature descriptors
- multimedia data
- high level
- machine learning
- image analysis