ViLT-CLIP: Video and Language Tuning CLIP with Multimodal Prompt Learning and Scenario-Guided Optimization.

Published in: AAAI (2024)

Keyphrases