Stacked cross-modal feature consolidation attention networks for image captioning.

Published in: Multim. Tools Appl. (2024)

Keyphrases