Login / Signup

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders.

Jason FongYun WangPrabhav AgrawalVimal ManoharJilong WuThilo KöhlerQing He
Published in: CoRR (2022)
Keyphrases
  • contextual information
  • speech recognition
  • semantic information
  • data sets
  • similarity measure
  • keywords
  • sensor networks
  • dimensionality reduction
  • context sensitive
  • source localization