A conditional random field approach for audio-visual people diarization.
Paul GayElie KhourySylvain MeignierJean-Marc OdobezPaul DeléglisePublished in: ICASSP (2014)
Keyphrases
- audio visual
- multi modal
- visual information
- visual data
- person authentication
- speaker verification
- speaker diarization
- multi stream
- emotion recognition
- multimedia
- temporal context
- audio visual speech recognition
- audio features
- speaker identification
- databases
- natural language processing
- feature vectors
- three dimensional
- multimodal fusion