Efficient Audio-Visual Speech Enhancement Using Deep U-Net With Early Fusion of Audio and Video Information and RNN Attention Blocks.
Jung-Wook HwangRae-Hong ParkHyung-Min ParkPublished in: IEEE Access (2021)
Keyphrases
- audio visual
- visual data
- audio features
- multimedia
- multimodal fusion
- visual information
- semantic information
- multi modal
- multi stream
- audio visual speech recognition
- keywords
- visual features
- contextual information
- space time
- computer vision
- multimedia data
- video content
- text data
- nearest neighbor
- motion estimation
- audio visual content