A joint audio-visual approach to audio localization.
Jesper Rindom JensenMads Græsbøll ChristensenPublished in: ICASSP (2015)
Keyphrases
- audio visual
- multi modal
- visual information
- audio features
- visual data
- emotion recognition
- audio visual speech recognition
- multimedia
- multi stream
- multimodal fusion
- temporal context
- audio visual content
- speaker verification
- domain knowledge
- spatio temporal
- person authentication
- image classification
- text mining
- metadata