Login / Signup

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.

Sihan ChenHandong LiQunbo WangZijia ZhaoMingzhen SunXinxin ZhuJing Liu
Published in: CoRR (2023)
Keyphrases