Login / Signup

Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision.

Andrew ShinMasato IshiiTakuya Narihira
Published in: Int. J. Comput. Vis. (2022)
Keyphrases
  • cross modal
  • multi modal
  • computer vision
  • image retrieval
  • perceptual information
  • multimedia databases
  • visual data
  • multimedia retrieval
  • knowledge base
  • high level
  • text classification
  • visual recognition