Login / Signup

CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.

Yi MengXiang LiZhiyong WuTingtian LiZixun SunXinyu XiaoChi SunHui ZhanHelen Meng
Published in: INTERSPEECH (2022)
Keyphrases
  • cross modal
  • text to speech synthesis
  • multi modal
  • text to speech
  • visual recognition
  • multimedia retrieval
  • perceptual information
  • high level
  • image retrieval
  • knn
  • visual information
  • multimedia databases