Login / Signup
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
Published in:
INTERSPEECH (2022)
Keyphrases
</>
cross modal
text to speech synthesis
multi modal
text to speech
visual recognition
multimedia retrieval
perceptual information
high level
image retrieval
knn
visual information
multimedia databases