Publication: Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text.