Login / Signup
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations.
Hang Li
Wenbiao Ding
Yu Kang
Tianqiao Liu
Zhongqin Wu
Zitao Liu
Published in:
EMNLP (1) (2021)
Keyphrases
</>
cross modal
multi modal
semantic representations
image retrieval
visual recognition
multimedia retrieval
visual similarity
training set
natural language
visual data
supervised learning
multimedia databases
perceptual information
image features
image classification
higher level