SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.

Junjie Li Yiwei Guo Xie Chen Kai Yu

Published in: ICASSP (2024)

Keyphrases

speech recognition
vector space
synthesized speech
neural network
audio visual
prosodic features
real time
language model
visual attention
nonlinear dimensionality reduction
text to speech