AV2WAV: Diffusion-Based Re-Synthesis from Continuous Self-Supervised Features for Audio-Visual Speech Enhancement.
Ju-Chieh ChouChung-Ming ChienKaren LivescuPublished in: ICASSP (2024)
Keyphrases
- language model
- audio visual
- person authentication
- multi modal
- audio features
- probabilistic model
- visual information
- feature vectors
- information retrieval
- visual data
- low level
- speech enhancement
- feature set
- feature extraction
- multimedia
- multi stream
- co occurrence
- image features
- feature space
- semantic information
- prior knowledge
- pattern recognition
- multiscale
- image processing