Login / Signup
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval.
Mengjun Cheng
Yipeng Sun
Longchao Wang
Xiongwei Zhu
Kun Yao
Jie Chen
Guoli Song
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
Published in:
CoRR (2022)
Keyphrases
</>
cross modal
multi modal
multimedia retrieval
image retrieval
visual similarity
multimedia databases
computer vision
content based retrieval
visual data
low level
test collection