Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition.

Published in: CoRR (2023)

Keyphrases