Learning to Generate Grounded Visual Captions Without Localization Supervision.

Published in: ECCV (18) (2020)

Keyphrases