SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.
Chak Ho ChanKaizhi QianYang ZhangMark Hasegawa-JohnsonPublished in: ICASSP (2022)
Keyphrases
- text to speech
- voice activity detection
- emotion recognition
- speech synthesis
- fundamental frequency
- speech recognition errors
- speech sounds
- speech quality
- speech recognition
- speech signal
- unsupervised learning
- prosodic features
- vocal tract
- semi supervised
- recognition engine
- noisy environments
- synthesized speech
- automatic speech recognition
- endpoint detection
- data driven
- database workloads
- restricted boltzmann machine
- dialogue system
- supervised learning
- spoken document retrieval
- spectral features
- rule selection
- natural language
- machine learning
- speaker identification
- spoken language
- parameter tuning
- supervised classification
- parameter settings
- text to speech synthesis
- neural network