Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching.

Published in: CoRR (2021)

Keyphrases