DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing.
Neha SahipjohnAshishkumar P. GudmalwarNirmesh ShahPankaj WasnikRajiv Ratn ShahPublished in: CoRR (2024)
Keyphrases
- text to speech
- multimodal interaction
- speech synthesis
- multimedia
- text to speech synthesis
- prosodic features
- programming tool
- video sequences
- control system
- video data
- multi modal
- real time
- floor control
- video frames
- video streams
- video analysis
- audio visual
- english text
- digital video
- visual speech
- video retrieval
- writing skills
- video shots
- word processing
- video database
- video clips
- control strategy
- video content
- multimedia data
- speech recognition
- space time