SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation.

Xingwei Liang You Zou Ruifeng Xu

Published in: CoRR (2023)

Keyphrases

cross modal
recurrent neural networks
long short term memory
emotion recognition
audio visual
multi modal
visual data
neural network
feed forward
visual information
artificial neural networks
facial expressions
human computer interaction
sentiment analysis
image retrieval
multimedia retrieval
natural language
image annotation
facial images
text classification
information fusion
high dimensional
feature space
affective states
low level
video data
video sequences
multimedia