MSER: Multimodal speech emotion recognition using cross-attention with deep fusion.
Mustaqeem KhanWail GueaiebAbdulmotaleb El-SaddikSoonil KwonPublished in: Expert Syst. Appl. (2024)
Keyphrases
- multimodal fusion
- audio visual
- multimodal interfaces
- emotion recognition
- human computer interaction
- multi modal
- multimodal interaction
- text to speech synthesis
- emotional speech
- information fusion
- multi stream
- visual information
- data fusion
- high robustness
- emotion classification
- speech recognition
- relevance feedback
- visual data
- deep learning
- user interface
- multimedia
- speech signal
- audio features
- speaker verification
- multimodal biometrics