Login / Signup
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection.
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Ming Yan
Shikun Zhang
Songhang Huang
Fei Huang
Published in:
CoRR (2024)
Keyphrases
</>
image patches
language generation
text mining
english text
computational complexity
training set
feature vectors
natural images
low resolution
training samples
computational linguistics
information retrieval
image features
small number
neural network
linear combination
test images
multiscale
feature selection