Video-Language Alignment via Spatio-Temporal Graph Transformer.
Shi-Xue ZhangHongfa WangXiaobin ZhuWeibo GuTianjin ZhangChun YangWei LiuXu-Cheng YinPublished in: CoRR (2024)
Keyphrases
- spatio temporal
- spatial and temporal
- space time
- spatial temporal
- video representation
- human actions
- spatio temporally
- video data
- multimedia
- temporal domain
- video content
- natural language
- video sequences
- connected components
- moving objects
- video frames
- video streams
- programming language
- graph model
- graph representation
- language learning
- structured data
- video retrieval
- fuzzy logic
- rewriting rules
- real time
- graph structure
- temporal segmentation
- video database
- spatial and temporal relationships
- video analysis
- graph theory
- directed graph
- temporal information
- bipartite graph
- video clips
- fault diagnosis
- weighted graph
- dynamic textures
- motion trajectories
- spatio temporal data
- graph matching
- spatio temporal databases
- image sequences
- event detection
- high voltage