Publication: MSER: Multimodal speech emotion recognition using cross-attention with deep fusion.