Audio-Visual Fusion Layers for Event Type Aware Video Recognition.
Arda SenocakJunsik KimTae-Hyun OhHyeonggon RyuDingzeyu LiIn So KweonPublished in: CoRR (2022)
Keyphrases
- audio visual
- multimodal fusion
- video summarization
- visual data
- person authentication
- multimedia
- multi modal
- video scene
- sports video
- audio features
- audio visual content
- event detection
- visual information
- temporal context
- object recognition
- gait recognition
- video data
- human activities
- pattern recognition
- multi stream
- visual speech
- multimodal biometrics
- video streams
- high dimensional
- video content
- video sequences
- metadata
- visual content
- human motion
- computer vision
- multimedia data
- activity recognition
- hidden markov models
- three dimensional
- audio visual speech recognition