Login / Signup
MoMo: A shared encoder Model for text, image and multi-Modal representations.
Rakesh Chada
Zhaoheng Zheng
Pradeep Natarajan
Published in:
CoRR (2023)
Keyphrases
</>
multi modal
similarity measure
image classification
computer vision
image segmentation
high level
image analysis
audio visual
video search
multiscale
video sequences
motion estimation
video data
segmentation method
image content
semantic concepts