The multi-modal fusion in visual question answering: a review of attention mechanisms.

Published in: PeerJ Comput. Sci. (2023)

Keyphrases