Login / Signup
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval.
Mengjun Cheng
Yipeng Sun
Longchao Wang
Xiongwei Zhu
Kun Yao
Jie Chen
Guoli Song
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
Published in:
CVPR (2022)
Keyphrases
</>
cross modal
multi modal
multimedia retrieval
image retrieval
multimedia databases
computer vision
visual data
visual similarity
information retrieval
image segmentation
object recognition
distance measure
retrieval systems
scene images