VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning.

Published in: CoRR (2022)

Keyphrases