Publication: Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text.