MoMo: A shared encoder Model for text, image and multi-Modal representations.

Rakesh Chada Zhaoheng Zheng Pradeep Natarajan

Published in: CoRR (2023)

Keyphrases

multi modal
similarity measure
image classification
computer vision
image segmentation
high level
image analysis
audio visual
video search
multiscale
video sequences
motion estimation
video data
segmentation method
image content
semantic concepts