MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis.
Qian YangJialong ZuoZhe SuZiyue JiangMingze LiZhou ZhaoFeiyang ChenZhefeng WangBaoxing HuaiPublished in: CoRR (2024)
Keyphrases
- speech synthesis
- speech recognition
- text to speech
- vocal tract
- prosodic features
- d scene
- outdoor images
- speech corpus
- photo collections
- three dimensional
- scene understanding
- single image
- image set
- scene recognition
- object detectors
- scene images
- outdoor scenes
- video sequences
- visual data
- benchmark datasets
- complex scenes
- indoor and outdoor scenes
- real scenes
- audio visual
- human actions
- visual features
- language model
- moving objects
- image sequences