Distilling Vision-Language Pre-training to Collaborate with Weakly-Supervised Temporal Action Localization.

Published in: CoRR (2022)

Keyphrases