Prosody Modeling with 3D Visual Information for Expressive Video Dubbing.
Zhihan YangShansong LiuXu LiHaozhe WuZhiyong WuYing ShanJia JiaPublished in: INTERSPEECH (2023)
Keyphrases
- visual information
- visual data
- video database
- visual cues
- audio visual
- visual features
- video segments
- low level
- temporal information
- content based image retrieval systems
- visual content
- video data
- human visual system
- textual information
- visual information retrieval
- image collections
- eye movements
- visual input
- multimedia
- real time
- content based image
- video content
- semantic concepts
- visual concepts
- visual descriptors
- video streams
- multi modal
- artificial intelligence
- video retrieval
- key frames
- video frames
- relational databases
- feature selection
- computer vision
- databases