Publication: Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.