STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training.
Weihong ZhongMao ZhengDuyu TangXuan LuoHeng GongXiaocheng FengBing QinPublished in: CoRR (2023)
Keyphrases
- spatial temporal
- human actions
- action recognition
- spatio temporal
- spatial and temporal
- video shots
- spatial and temporal information
- space time interest points
- temporal information
- space time
- human activities
- video database
- video sequences
- spatial relationships
- spatial information
- video retrieval
- d objects
- vehicle license plate
- video data