Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
Hengshun ZhouJun DuHang ChenZijun JingShifu XiongChin-Hui LeePublished in: Interspeech (2021)
Keyphrases
- information fusion
- audio visual
- student learning
- cross modal
- multi modal
- visual data
- data fusion
- learning tools
- collaborative learning
- soft computing
- visual information
- image annotation
- learning process
- nearest neighbor
- video data
- image retrieval
- learning experience
- image processing
- image data
- visual content
- multiscale
- multimedia