Sign in

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding.

Shuhuai RenSishuo ChenShicheng LiXu SunLu Hou
Published in: CoRR (2023)
Keyphrases