Keyphrases
- audio visual
- emotion recognition
- speech recognition
- multi modal
- speech signal
- spoken language
- text to speech
- video retrieval
- automatic speech recognition
- speech synthesis
- speaker verification
- visual perception
- emotional state
- neural network
- color vision
- human perception
- multimedia
- endpoint detection
- visual data
- audio stream
- data sets
- content based video retrieval
- artificial intelligence
- speech processing
- speaker recognition
- probabilistic model
- speaker identification
- broadcast news
- facial expressions
- language acquisition
- fine grained
- multimedia content