MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis.
Xiang LiZhi-Qi ChengJun-Yan HeXiaojiang PengAlexander G. HauptmannPublished in: CoRR (2024)
Keyphrases
- text to speech
- text to speech synthesis
- multimodal interaction
- speech synthesis
- multi modal
- prosodic features
- multimodal interfaces
- word processing
- affect detection
- average error
- rms error
- information retrieval
- emotional intelligence
- emotion recognition
- audio visual
- neural network
- medical images
- similarity measure
- multimodal data
- clinical setting
- image segmentation
- root mean square
- information systems
- social networks
- brain image analysis