Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.

Published in: CoRR (2023)

Keyphrases