Prototypical speaker-interference loss for target voice separation using non-parallel audio samples.
Seongkyu MunDhananjaya GowdaJihwan LeeChangwoo HanDokyun LeeChanwoo KimPublished in: INTERSPEECH (2022)
Keyphrases
- prosodic features
- audio visual
- emotion recognition
- text to speech
- speaker verification
- speaker identification
- mel frequency cepstral coefficients
- multimedia
- speech synthesis
- audio stream
- training set
- speaker recognition
- speech recognition
- audio features
- voice activity detection
- sample set
- parallel processing
- training samples
- multi modal
- multipath
- acoustic features
- visual information
- automatic transcription
- synthesized speech
- music information retrieval
- target detection
- target tracking
- parallel implementation
- target object
- spectral features
- speaker diarization
- pattern recognition