Fully-attentive iterative networks for region-based controllable image and video captioning.

Published in: Comput. Vis. Image Underst. (2023)

Keyphrases