Multi-modal spatial relational attention networks for visual question answering.

Published in: Image Vis. Comput. (2023)

Keyphrases