Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding.

Heng Zhao Joey Tianyi Zhou Yew-Soon Ong

Published in: CoRR (2021)

Keyphrases

co occurrence
n gram
word recognition
word sense disambiguation
neural network
visual features
visual information
selective attention
related words
fuzzy logic
visual attention
image pixels
sentence level