u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.
Wei-Ning HsuBowen ShiPublished in: NeurIPS (2022)
Keyphrases
- multi modal
- speech recognition
- labeled data
- transfer learning
- audio visual
- speech signal
- speech synthesis
- medical images
- unsupervised learning
- semi supervised learning
- training data
- modal logic
- active learning
- unlabeled data
- speaker recognition
- supervised learning
- knowledge transfer
- training examples
- unified model
- automatic speech recognition
- cross domain
- prior knowledge
- language acquisition
- spoken language
- text to speech
- vocal tract
- training set
- semi supervised
- non stationary
- multi stream
- data sets
- recognition engine
- endpoint detection