Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework.
Yang LiuHaoqin SunWenbo GuanYuqi XiaZhen ZhaoPublished in: Speech Commun. (2022)
Keyphrases
- multi modal
- fusion framework
- attention mechanism
- multiscale
- audio visual
- visual attention
- fusion process
- saliency map
- natural images
- image fusion
- high dimensional
- combining multiple
- edge detection
- image segmentation
- image processing
- eye tracking
- data fusion
- image representation
- vision system
- visual attention model
- wavelet coefficients
- audio features
- wavelet transform
- image sequences