Instruction-ViT: Multi-modal prompts for instruction learning in vision transformer.
Zhenxiang XiaoYuzhong ChenJunjie YaoLu ZhangZhengliang LiuZihao WuXiaowei YuYi PanLin ZhaoChong MaXinyu LiuWei LiuXiang LiYixuan YuanDinggang ShenDajiang ZhuDezhong YaoTianming LiuXi JiangPublished in: Inf. Fusion (2024)