TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding.
Shuhuai RenSishuo ChenShicheng LiXu SunLu HouPublished in: EMNLP (Findings) (2023)
Keyphrases
- language understanding
- spatial and temporal
- temporal correlation
- space time
- spatio temporal
- temporal resolution
- video frames
- temporal information
- temporal continuity
- temporal redundancy
- natural language understanding
- video sequences
- video data
- dialogue system
- language processing
- general knowledge
- semantic interpretation
- expert systems