Speaker disentanglement in video-to-speech conversion.
Dan OneataAdriana StanHoria CucuPublished in: CoRR (2021)
Keyphrases
- speech recognition
- audio visual
- automatic speech recognition
- speaker recognition
- audio stream
- speaker verification
- broadcast news
- speaker diarization
- speaker identification
- multimedia
- speech signal
- video sequences
- visual speech
- visual data
- video streams
- video data
- prosodic features
- content based video retrieval
- vocal tract
- automatic speech recognition systems
- video clips
- hidden markov models
- video content
- speaker dependent
- audio video
- video frames
- video retrieval
- vector quantization
- speech sounds
- space time
- pattern recognition
- speech recognizer
- video analysis
- temporal information
- video search
- real time
- digital audio
- natural language descriptions
- video database
- speech synthesis
- language model
- noisy environments