Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding.
Arda SenocakJunsik KimTae-Hyun OhDingzeyu LiIn So KweonPublished in: WACV (2023)
Keyphrases
- video frames
- audio visual
- video data
- multi modal
- video summarization
- video streams
- person authentication
- video content
- multimodal fusion
- video scene
- visual information
- visual data
- multimedia
- high level
- key frames
- multi stream
- sports video
- temporal context
- audio visual content
- event detection
- multimedia data
- low level
- spatio temporal
- data analysis