Word2Pix: Word to Pixel Cross-Attention Transformer in Visual Grounding.

Published in: IEEE Trans. Neural Networks Learn. Syst. (2024)

Keyphrases