Multimodal embedding fusion for robust speaker role recognition in video broadcast.
Mickael RouvierSebastien DelecrazBenoît FavreMeriem BendrisFrédéric BéchetPublished in: ASRU (2015)
Keyphrases
- face biometrics
- audio visual
- multimodal biometrics
- multimedia
- multimodal fusion
- robust recognition
- recognition rate
- partial occlusion
- object recognition
- video data
- human face recognition
- visual speech
- video sequences
- video frames
- video streams
- pattern recognition
- video content
- tv broadcast
- video analysis
- multi modal
- digital television
- information fusion
- key frames
- data fusion
- speech recognition
- feature extraction
- hidden markov models
- speaker identification
- news video
- noisy environments
- fusion method
- digital video
- recognition algorithm
- activity recognition
- action recognition