Login / Signup
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision.
Andrew Shin
Masato Ishii
Takuya Narihira
Published in:
CoRR (2021)
Keyphrases
</>
cross modal
multi modal
multimedia retrieval
computer vision
perceptual information
natural language
visual recognition
search engine
e learning
high level
text classification