Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation.
Daisuke NiizumiDaiki TakeuchiYasunori OhishiNoboru HaradaKunio KashinoPublished in: INTERSPEECH (2023)
Keyphrases
- general purpose
- denoising
- audio visual
- speech recognition
- audio stream
- text to speech
- audio signals
- broadcast news
- speech signal
- speaker identification
- speech synthesis
- speech processing
- audio features
- digital audio
- emotion recognition
- speaker recognition
- multi stream
- multimodal interfaces
- cepstral features
- automatic speech recognition
- linear predictive coding
- image denoising
- acoustic features
- application specific
- spontaneous speech
- spoken language
- natural images
- visual data
- special purpose
- acoustic signals
- multimodal interaction
- speech music discrimination
- human language
- noisy images
- image processing
- noisy environments
- total variation