HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification.

Published in: CoRR (2024)

Keyphrases