Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.
Israel D. GebruSileye O. BaXiaofei LiRadu HoraudPublished in: IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Keyphrases
- audio visual
- speaker diarization
- speaker verification
- multi modal
- visual information
- visual data
- speech recognition
- information fusion
- emotion recognition
- bayesian information criterion
- space time
- broadcast news
- multimedia
- audio features
- multimedia data
- gaussian mixture model
- model selection
- image data
- video sequences
- image sequences