LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment.
Bin ZhuBin LinMunan NingYang YanJiaxi CuiHongfa WangYatian PangWenhao JiangJunwu ZhangZongwei LiCaiwan ZhangZhifeng LiWei LiuLi YuanPublished in: ICLR (2024)