Login / Signup
VioLET: Vision-Language Efficient Tuning with Collaborative Multi-modal Gradients.
Yaoming Wang
Yuchen Liu
Xiaopeng Zhang
Jin Li
Bowen Shi
Chenglin Li
Wenrui Dai
Hongkai Xiong
Qi Tian
Published in:
ACM Multimedia (2023)
Keyphrases
</>
multi modal
multi modality
cross modal
image annotation
computer vision
audio visual
high dimensional
semantic concepts
machine learning
xml documents