Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models.

Published in: CoRR (2022)

Keyphrases