Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19.
Jesús Antonio Villalba LópezDaniel Garcia-RomeroNanxin ChenGregory SellJonas BorgstromAlan McCreeLeibny Paola García-PereraSaurabh KatariaPhani Sankar NidadavoluPedro Torres-CarrasquiiloNajim DehakPublished in: Odyssey (2020)
Keyphrases
- visual data
- speaker recognition
- audio visual
- speaker verification
- visual information
- gaussian mixture model
- speaker identification
- probabilistic neural network
- high dimensional data
- vector quantization
- video data
- image data
- high dimensional
- multimedia data
- video sequences
- audio features
- speaker diarization
- visual features
- human motion
- speech recognition
- contextual information
- image sequences
- multimedia
- visual content
- speech signal
- emotion recognition
- human actions
- image content
- action recognition
- image database