Sign in

Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog.

Zekang LiZongjia LiJinchao ZhangYang FengJie Zhou
Published in: IEEE ACM Trans. Audio Speech Lang. Process. (2021)
Keyphrases