Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation.
Daisuke NiizumiDaiki TakeuchiYasunori OhishiNoboru HaradaKunio KashinoPublished in: CoRR (2023)
Keyphrases
- general purpose
- denoising
- audio visual
- broadcast news
- audio stream
- text to speech
- speech recognition
- speaker identification
- speech processing
- audio signals
- emotion recognition
- speech music discrimination
- cepstral features
- spoken documents
- prosodic features
- natural images
- total variation
- image denoising
- recognition engine
- speaker recognition
- linear predictive coding
- signal processing
- speech synthesis
- acoustic signals
- audio recordings
- digital audio
- speech signal
- spoken language
- special purpose
- automatic transcription
- domain specific
- hearing impaired
- probabilistic model
- audio video
- spontaneous speech
- content based video retrieval
- multi stream
- multi modal
- visual features
- audio features
- wavelet packet