Mouth2Audio: intelligible audio synthesis from videos with distinctive vowel articulation.
Saurabh GargHaoyao RuanGhassan HamarnehDawn M. BehneAllard JongmanJoan A. SerenoYue WangPublished in: Int. J. Speech Technol. (2023)
Keyphrases
- multimedia
- visual data
- visual information
- signal processing
- prosodic features
- audio visual
- long video
- video material
- audio features
- video sequences
- audio recordings
- video content analysis
- audio video
- lecture videos
- multimedia information
- digital video
- video clips
- speaker identification
- text to speech
- video scene
- video frames
- video data