Login / Signup
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding.
Shuhuai Ren
Sishuo Chen
Shicheng Li
Xu Sun
Lu Hou
Published in:
EMNLP (Findings) (2023)
Keyphrases
</>
language understanding
spatial and temporal
temporal correlation
space time
spatio temporal
temporal resolution
video frames
temporal information
temporal continuity
temporal redundancy
natural language understanding
video sequences
video data
dialogue system
language processing
general knowledge
semantic interpretation
expert systems