mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
Chenliang LiHaiyang XuJunfeng TianWei WangMing YanBin BiJiabo YeHehong ChenGuohai XuZheng CaoJi ZhangSongfang HuangFei HuangJingren ZhouLuo SiPublished in: CoRR (2022)