Speaker disentanglement in video-to-speech conversion.
Dan OneataAdriana StanHoria CucuPublished in: EUSIPCO (2021)
Keyphrases
- speech recognition
- audio visual
- automatic speech recognition
- speaker recognition
- speaker identification
- audio stream
- multimedia
- speaker verification
- video data
- broadcast news
- prosodic features
- audio video
- audio features
- video sequences
- speech synthesis
- content based video retrieval
- speaker diarization
- speech signal
- visual data
- video content
- real time
- speaker dependent
- video frames
- automatic speech recognition systems
- hidden markov models
- visual speech
- automatic transcription
- space time
- synthesized speech
- audio signals
- video streams
- language model
- vocal tract
- video analysis
- video clips
- spontaneous speech
- spatial and temporal
- video surveillance
- video retrieval
- digital audio
- speech recognition systems
- vector quantization
- video shots
- speech recognizer
- digital video
- probabilistic model
- emotion recognition