Sign in

CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.

Yi MengXiang LiZhiyong WuTingtian LiZixun SunXinyu XiaoChi SunHui ZhanHelen Meng
Published in: CoRR (2023)
Keyphrases
  • cross modal
  • text to speech synthesis
  • multi modal
  • perceptual information
  • multimedia retrieval
  • visual recognition
  • image retrieval
  • multimedia databases