InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Zhe ChenJiannan WuWenhai WangWeijie SuGuo ChenSen XingMuyan ZhongQinglong ZhangXizhou ZhuLewei LuBin LiPing LuoTong LuYu QiaoJifeng DaiPublished in: CoRR (2023)