Mel-S3R: Combining Mel-spectrogram and self-supervised speech representation with VQ-VAE for any-to-any voice conversion.
Jichen YangYi ZhouHao HuangPublished in: Speech Commun. (2023)
Keyphrases
- speech signal
- cepstral coefficients
- emotion recognition
- vector quantization
- wigner distribution
- speech quality
- speaker recognition
- speech recognition
- image compression
- speech recognition errors
- text to speech
- speech synthesis
- fundamental frequency
- automatic speech recognition
- pattern analysis
- hidden markov models
- speech sounds
- broadcast news
- audio visual
- voice activity detection