Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos.

Published in: CoRR (2023)

Keyphrases