Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.
Yi ZhaoHaoyu LiCheng-I LaiJennifer WilliamsErica CooperJunichi YamagishiPublished in: INTERSPEECH (2020)
Keyphrases
- vector quantization
- text to speech
- speech synthesis
- vector quantizer
- speaker recognition
- image compression
- finite state vector quantization
- vector quantized
- speech recognition
- image coding
- image reconstruction
- reconstructed image
- prosodic features
- audio visual
- fractal image coding
- entropy constrained
- distortion measure
- multi stream
- bag of words
- fundamental frequency
- codebook generation
- synthesized speech
- multiscale
- speech signal
- hidden markov models
- three dimensional
- reconstruction method
- automatic speech recognition
- image representation
- codebook design
- high resolution
- training set