Sign in

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval.

Haoyu LuNanyi FeiYuqi HuoYizhao GaoZhiwu LuJi-Rong Wen
Published in: CoRR (2022)
Keyphrases
  • cross modal
  • multi modal
  • information retrieval
  • computer vision
  • web pages
  • high level
  • low level
  • natural language processing
  • information retrieval systems
  • query expansion
  • action recognition
  • image understanding