Login / Signup
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers.
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
Published in:
CoRR (2020)
Keyphrases
</>
multi modal
image pixels
video search
multiple modalities
neighboring pixels
pixel intensities
cross correlations
multi modality
cross modal
pixel values
image registration
audio visual
image regions
matching cost
semantic concepts
high dimensional
uni modal
image annotation
keywords
image sequences