Login / Signup
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Mingzhen Sun
Xinxin Zhu
Jing Liu
Published in:
NeurIPS (2023)
Keyphrases
</>
computational model
database
information retrieval
multimedia
prior knowledge
vision system
multi modal
mathematical model
formal model
real time
similarity measure
cost function
statistical model
text documents
conceptual model