MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling.
Diwei HuangKunyang LinPeihao ChenQing DuMingkui TanPublished in: CoRR (2024)
Keyphrases
- audio visual
- multi modal
- temporal segmentation
- sports video
- visual information
- video summarization
- audio visual speech recognition
- visual features
- multi stream
- video sequences
- person authentication
- emotion recognition
- visual data
- multimodal fusion
- video data
- data sets
- key frames
- spatio temporal
- computer vision
- databases