TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding.
Shuhuai RenSishuo ChenShicheng LiXu SunLu HouPublished in: CoRR (2023)
Keyphrases
- language understanding
- spatial and temporal
- temporal correlation
- space time
- spatio temporal
- temporal resolution
- temporal redundancy
- language processing
- temporal information
- natural language understanding
- video data
- video frames
- video sequences
- temporal continuity
- video analysis
- spoken dialogue systems
- text mining
- domain knowledge
- artificial intelligence