Multimodal Analysis for Deep Video Understanding with Video Language Transformer.
Beibei ZhangYaqun FangTongwei RenGangshan WuPublished in: ACM Multimedia (2022)
Keyphrases
- video analysis
- video sequences
- video data
- multimedia
- visual analysis
- video streams
- video content
- real time video
- video clips
- real time
- temporal analysis
- multi modal
- video frames
- digital video
- video database
- video retrieval
- spatial and temporal
- space time
- statistical analysis
- data analysis
- event recognition
- language learning
- future development
- online video
- dynamic scenes
- video surveillance
- key frames
- event detection
- human computer interaction
- genetic algorithm