Using audio and visual cues for speaker diarisation initialisation.
Giulia GarauHervé BourlardPublished in: ICASSP (2010)
Keyphrases
- visual cues
- visual information
- audio visual
- low level
- speaker identification
- visual data
- prosodic features
- visual features
- audio stream
- speaker verification
- emotion recognition
- mid level
- audio features
- lecture videos
- speech recognition
- multimedia
- multi modal
- acoustic features
- automatic transcription
- multiple visual cues
- speaker recognition
- eye movements
- semantic information
- depth cues
- audio signal
- broadcast news
- text to speech
- multiple cues
- gaussian mixture model
- speaker diarization
- high level