Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering.

Published in: ICANN (7) (2023)

Keyphrases