Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues.
Tianxiang ChenZhentao TanTao GongQi ChuYue WuBin LiuLe LuJieping YeNenghai YuPublished in: CoRR (2024)
Keyphrases
- audio visual
- multimodal fusion
- temporal segmentation
- multi modal
- visual information
- emotion recognition
- visual data
- multi stream
- multimedia
- audio features
- temporal context
- audio visual speech recognition
- video scene
- speaker verification
- image segmentation
- information extraction
- visual features
- multiscale
- image classification
- high dimensional
- data sets