Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis.
Triantafyllos KefalasYannis PanagakisMaja PanticPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
- speech synthesis
- prosodic features
- text to speech
- multimedia
- audio video
- speech recognition
- digital video
- supervised learning
- scene change detection
- video content analysis
- video data
- audio files
- multimedia processing
- visual data
- supervised training
- multimedia information
- video files
- video content
- video sequences
- unsupervised manner
- video frames
- video streams
- multimedia data
- digital audio
- classifier training
- unsupervised learning
- audio features
- video analysis
- closed captions
- audio stream
- visual information
- audio content
- story segmentation
- speech corpus
- video recordings
- vocal tract
- pattern recognition
- training set
- audio signals
- lecture videos
- audio signal
- broadcast news
- language model
- video database
- video retrieval
- video material
- deep architectures
- hidden markov models
- video clips
- event detection
- media streams
- neural network