Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment.
Hao FeiShengqiong WuMeishan ZhangMin ZhangTat-Seng ChuaShuicheng YanPublished in: CoRR (2024)
Keyphrases
- temporal filtering
- temporal analysis
- video data
- spatio temporal
- temporal correlation
- video sequences
- space time
- natural language
- structural information
- programming language
- multimedia
- video frames
- language learning
- video analysis
- video streams
- real time
- video content
- semantic representations
- spatial temporal
- video database
- digital video
- higher level
- video processing
- information retrieval
- spatial and temporal
- online video
- motion estimation
- structural analysis
- surveillance videos
- structural features
- motion vectors
- video clips
- event detection