SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation.
Xingwei LiangYou ZouRuifeng XuPublished in: CoRR (2023)
Keyphrases
- cross modal
- recurrent neural networks
- long short term memory
- emotion recognition
- audio visual
- multi modal
- visual data
- neural network
- feed forward
- visual information
- artificial neural networks
- facial expressions
- human computer interaction
- sentiment analysis
- image retrieval
- multimedia retrieval
- natural language
- image annotation
- facial images
- text classification
- information fusion
- high dimensional
- feature space
- affective states
- low level
- video data
- video sequences
- multimedia