VLCAP: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning.

Published in: ICIP (2022)

Keyphrases