Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation.
Min-Jae HwangIlia KulikovBenjamin PeloquinHongyu GongPeng-Jen ChenAnn LeePublished in: CoRR (2024)
Keyphrases
- noisy environments
- speech recognition
- speech enhancement
- automatic speech recognition
- speech signal
- image noise
- endpoint detection
- text to speech
- noise reduction
- audio visual
- emotion recognition
- recognition engine
- speech synthesis
- multimodal interfaces
- salt pepper
- greater robustness
- dialogue system
- spoken dialogue systems
- speaker identification
- broadcast news
- robust statistical
- spoken language
- signal to noise ratio
- speech quality
- audio stream
- geometric distortions
- language acquisition
- language model