Login / Signup

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.

Junjie LiYiwei GuoXie ChenKai Yu
Published in: ICASSP (2024)
Keyphrases
  • speech recognition
  • vector space
  • synthesized speech
  • neural network
  • audio visual
  • prosodic features
  • real time
  • language model
  • visual attention
  • nonlinear dimensionality reduction
  • text to speech