Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition.
Hengshun ZhouJun DuYuanyuan ZhangQing WangQing-Feng LiuChin-Hui LeePublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2021)